Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON filtering error after horizontal connection #25813

Closed
linhantao opened this issue Jun 29, 2021 · 4 comments · Fixed by #35759
Closed

JSON filtering error after horizontal connection #25813

linhantao opened this issue Jun 29, 2021 · 4 comments · Fixed by #35759
Assignees
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. component/json severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@linhantao
Copy link

linhantao commented Jun 29, 2021

TIDB version : V5.1 / V5.0
error message:[Err] 3140 - Invalid JSON text: The document root must not be followed by other values.

When left join is used, JSON value filtering is performed on the right table, and an error occur,
When using a right join, JSON value filtering is performed on the left table and an error occurs.
for example:
image

When a left join is used, JSON value filtering is performed on the left table without error.Such as:
image

However, the values in the JSON filter after the concatenation are all numeric, so the query can be executed without letters
for example:
image
image

@linhantao linhantao added the type/bug The issue is confirmed as a bug. label Jun 29, 2021
@jingshanglu
Copy link
Contributor

@linhantao Thanks for your feedback, can you add some information about the table structure and the steps to reproduce?

@linhantao
Copy link
Author

@jingshanglu repeat the steps as follows:
1、create one table :

create table test_json (id varchar(100), 家庭住址 json,other varchar(100));

2、insert one data:

insert into test_json values('1','{"detail": "", "list": [{"id": "41bb58cab2954a768b7a2e85e82ef369", "value": "湖北省"}, {"id": "0dbbcdb6b6524f5fb47585e49fda44be", "value": "武汉市"}, {"id": "0e6f75f3bb1e46b8a329fc9aa62187f4", "value": "东西湖区"}]}','其他数据');

3、error sql:

select * from test_json A left join (select * from test_json where false) B on A.家庭住址 =B.家庭住址 where B.家庭住址 ->'$.list[0].id'='0asdsss'

image

4、 success sql:

select * from test_json A left join (select * from test_json where false) B on A.家庭住址 =B.家庭住址 where A.家庭住址 ->'$.list[0].id'='0asdsss'

image

select * from test_json A left join (select * from test_json where false) B on A.家庭住址 =B.家庭住址 where B.家庭住址 ->'$.list[0].id'='122'

image

@jebter jebter added affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. labels Jan 11, 2022
@VelocityLight VelocityLight added the affects-6.1 This bug affects the 6.1.x(LTS) versions. label May 20, 2022
@solotzg
Copy link
Contributor

solotzg commented May 26, 2022

There is something wrong when parsing binary from json str

[2022/05/26 02:39:06.733 +00:00] [INFO] [conn.go:1149] ["command dispatched failed"] [conn=1396212641508098479] [connInfo="id:1396212641508098479, addr:172.26.0.1:60152 status:10, collation:utf8_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="select * from test_json A left join (select * from test_json where false) B on A.addr =B.addr where B.addr ->'$.list[0].id'='0asdsss'"] [txn_mode=PESSIMISTIC] [timestamp=433463688357478401] [err="[json:3140]Invalid JSON text: The document root must not be followed by other values.
github.com/pingcap/errors.AddStack
    /go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStackByArgs
    /go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:164
github.com/pingcap/tidb/types/json.ParseBinaryFromString
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/types/json/binary.go:403
github.com/pingcap/tidb/expression.(*builtinCastStringAsJSONSig).vecEvalJSON
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/builtin_cast_vec.go:795
github.com/pingcap/tidb/expression.(*ScalarFunction).VecEvalJSON
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/scalar_function.go:78
github.com/pingcap/tidb/expression.(*builtinEQJSONSig).vecEvalInt
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/builtin_compare_vec_generated.go:1124
github.com/pingcap/tidb/expression.(*ScalarFunction).VecEvalInt
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/scalar_function.go:48
github.com/pingcap/tidb/expression.EvalExpr
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/expression.go:566
github.com/pingcap/tidb/expression.VecEvalBool
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/expression.go:361
github.com/pingcap/tidb/expression.vectorizedFilter
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/chunk_executor.go:489
github.com/pingcap/tidb/expression.VectorizedFilterConsiderNull
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/chunk_executor.go:401
github.com/pingcap/tidb/expression.VectorizedFilter
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/expression/chunk_executor.go:379
github.com/pingcap/tidb/executor.(*SelectionExec).Next
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:1420
github.com/pingcap/tidb/executor.Next
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/executor.go:319
github.com/pingcap/tidb/executor.(*recordSet).Next
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/executor/adapter.go:151
github.com/pingcap/tidb/server.(*tidbResultSet).Next
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:323
github.com/pingcap/tidb/server.(*clientConn).writeChunks
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2212
github.com/pingcap/tidb/server.(*clientConn).writeResultset
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2163
github.com/pingcap/tidb/server.(*clientConn).handleStmt
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:2044
github.com/pingcap/tidb/server.(*clientConn).handleQuery
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1876
github.com/pingcap/tidb/server.(*clientConn).dispatch
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1371
github.com/pingcap/tidb/server.(*clientConn).Run
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:1121
github.com/pingcap/tidb/server.(*Server).onConn
    /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:559
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1571"]
MySQL [test]> explain select * from test_json A left join (select * from test_json where false) B on A.addr =B.addr where B.addr ->'$.list[0].id'='0asdsss';
+-------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
| id                            | estRows | task      | access object | operator info                                                                       |
+-------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
| Selection_12                  | 0.80    | root      |               | eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY)) |
| └─HashJoin_13                 | 1.00    | root      |               | left outer join, equal:[eq(test.test_json.addr, test.test_json.addr)]               |
|   ├─Selection_17(Build)       | 0.00    | root      |               | not(isnull(cast(test.test_json.addr, var_string(4294967295))))                      |
|   │ └─TableDual_18            | 0.00    | root      |               | rows:0                                                                              |
|   └─TableReader_16(Probe)     | 1.00    | root      |               | data:TableFullScan_15                                                               |
|     └─TableFullScan_15        | 1.00    | cop[tikv] | table:A       | keep order:false, stats:pseudo                                                      |
+-------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)

But for the normal one, eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY)) is pushed down to tikv.

MySQL [test]> explain select * from test_json A left join (select * from test_json where false) B on A.addr =B.addr where A.addr ->'$.list[0].id'='0asdsss';
+-----------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| id                          | estRows | task      | access object | operator info                                                                                                                                       |
+-----------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_12                 | 0.80    | root      |               | left outer join, equal:[eq(test.test_json.addr, test.test_json.addr)]                                                                               |
| ├─Selection_17(Build)       | 0.00    | root      |               | eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY)), not(isnull(cast(test.test_json.addr, var_string(4294967295)))) |
| │ └─TableDual_18            | 0.00    | root      |               | rows:0                                                                                                                                              |
| └─TableReader_16(Probe)     | 0.80    | root      |               | data:Selection_15                                                                                                                                   |
|   └─Selection_15            | 0.80    | cop[tikv] |               | eq(json_extract(test.test_json.addr, "$.list[0].id"), cast("0asdsss", json BINARY))                                                                 |
|     └─TableFullScan_14      | 1.00    | cop[tikv] | table:A       | keep order:false, stats:pseudo                                                                                                                      |
+-----------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

@LittleFall
Copy link
Contributor

LittleFall commented Jun 27, 2022

Reproduce

minimal reproduce:

create table t(a json);
insert into t values('{"id": "ish"}');
select * from t t1 left join t t2 on t1.a=t2.a where t2.a='ish';
+--------------------------------+---------+-----------+---------------+--------------------------------------------------------------+
| id                             | estRows | task      | access object | operator info                                                |
+--------------------------------+---------+-----------+---------------+--------------------------------------------------------------+
| Selection_8                    | 0.80    | root      |               | eq(test.t.a, cast("ish", json BINARY)) |
| └─HashJoin_9                   | 1.00    | root      |               | left outer join, equal:[eq(test.t.a, test.t.a)]              |
|   ├─TableReader_15(Build)      | 0.80    | root      |               | data:Selection_14                                            |
|   │ └─Selection_14             | 0.80    | cop[tikv] |               | not(isnull(cast(test.t.a, var_string(4294967295))))          |
|   │   └─TableFullScan_13       | 1.00    | cop[tikv] | table:t2      | keep order:false, stats:pseudo                               |
|   └─TableReader_12(Probe)      | 1.00    | root      |               | data:TableFullScan_11                                        |
|     └─TableFullScan_11         | 1.00    | cop[tikv] | table:t1      | keep order:false, stats:pseudo                               |
+--------------------------------+---------+-----------+---------------+--------------------------------------------------------------+
7 rows in set (0.00 sec)

this issue will reproduce when :

  1. plan include pattern eq(some_json, cast("some_string", json))
  2. this pattern is executed by tidb executor (not tikv)
  3. logical optimize rule PredicatePushDown is triggered (usually adding a join will enable it)

Root Cause

actually 0asdsss or ish is not a valid JSON, but "ish" is a valid JSON, we can check it by executing SQL SELECT CAST('ish' AS json); and SELECT CAST('"ish"' AS json);.

mysql> SELECT CAST('"ish"' AS json);
+-----------------------+
| CAST('"ish"' AS json) |
+-----------------------+
| "ish"                 |
+-----------------------+
1 row in set (0.00 sec)

mysql> SELECT CAST('ish' AS json);
ERROR 3140 (22032): Invalid JSON text: The document root must not be followed by other values.

phase planBuilder:

  1. When comparing JSON with string eq(some_json, some_string), tidb will cast this string to JSON first.
  2. Tidb knows that most strings are not valid JSON, so when it detected the compare between string and json, it will remove the flag ParseToJSONFlag so executor won't really parse this string.
    https://github.com/pingcap/tidb/blob/master/expression/builtin_compare.go#L1741-L1751
func (c *compareFunctionClass) generateCmpSigs(ctx sessionctx.Context, args []Expression, tp types.EvalType) (sig builtinFunc, err error) {
	if tp == types.ETJson {
		// In compare, if we cast string to JSON, we shouldn't parse it.
		for i := range args {
			DisableParseJSONFlag4Expr(args[i])
		}
	}

phase logicalOptimize
3. unfortunately, logical optimize rule PredicatePushDown has a potential bug, which will unexpectedly reset the flag of expressions. #35759 has more information about it.

phase executor
4. when executor builtinCastStringAsJSONSig is working, it checks whether the ParseToJSONFlag is set. if not set, it just do a simple binary convert. if set, it will parse ish, so tidb report.
https://github.com/pingcap/tidb/blob/master/expression/builtin_cast_vec.go#L787-L810

Fix

#35759 will fix this issue.

suggest closing the cherry-picks of this pr, because it has a little effect and difficult trigger condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. component/json severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants