Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPM: the captured binding has _utf8mb4 encoding #21475

Closed
ChenPeng2013 opened this issue Dec 3, 2020 · 10 comments
Closed

SPM: the captured binding has _utf8mb4 encoding #21475

ChenPeng2013 opened this issue Dec 3, 2020 · 10 comments
Assignees
Labels

Comments

@ChenPeng2013
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

use test;
drop table if exists t;
create table t(name varchar(25), index idx(name));
package main

import (
	"database/sql"
	_ "github.com/go-sql-driver/mysql"
)

func main() {
	dsn := "root:@tcp(192.168.228.83:4000)/test?charset=utf8&parseTime=True"

	db, err := sql.Open("mysql", dsn)
	if err != nil {
		panic(err)
	}
	defer db.Close()
	s := `update t set name = "hello" where name <= "abc"`
	if _, err := db.Exec(s); err != nil {
		panic(err)
	}
	if _, err := db.Exec(s); err != nil {
		panic(err)
	}
}
admin capture bindings;
show global bindings;

2. What did you expect to see? (Required)

mysql> show global bindings;
+---------------------------------------+--------------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
| Original_sql                          | Bind_sql                                                                                                     | Default_db | Status | Create_time             | Update_time             | Charset | Collation | Source  |
+---------------------------------------+--------------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
| update t set name = ? where name <= ? | UPDATE /*+ use_index(@`upd_1` `test`.`t` `idx`)*/ `t` SET `name`='hello' WHERE `name`<='abc' | test       | using  | 2020-12-03 10:00:35.577 | 2020-12-03 10:00:35.577 |         |           | capture |
+---------------------------------------+--------------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
1 row in set (0.00 sec)

3. What did you see instead (Required)

mysql> show global bindings;
+---------------------------------------+--------------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
| Original_sql                          | Bind_sql                                                                                                     | Default_db | Status | Create_time             | Update_time             | Charset | Collation | Source  |
+---------------------------------------+--------------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
| update t set name = ? where name <= ? | UPDATE /*+ use_index(@`upd_1` `test`.`t` `idx`)*/ `t` SET `name`=_UTF8MB4'hello' WHERE `name`<=_UTF8MB4'abc' | test       | using  | 2020-12-03 10:00:35.577 | 2020-12-03 10:00:35.577 |         |           | capture |
+---------------------------------------+--------------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
1 row in set (0.00 sec)

4. What is your TiDB version? (Required)

Release Version: v4.0.0-beta.2-1702-g0c3c4c588
Edition: Community
Git Commit Hash: 0c3c4c588aa52b5bc79edfed9d6f1431d9cd2e2b
Git Branch: master
UTC Build Time: 2020-12-03 02:55:39
GoVersion: go1.15.3
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
@ChenPeng2013 ChenPeng2013 added the type/bug The issue is confirmed as a bug. label Dec 3, 2020
@xiongjiwei
Copy link
Contributor

I think this is expected. the sql after restore will have charset information. it is by-design

@morgo
Copy link
Contributor

morgo commented Dec 3, 2020

I think it's a bug. It should be able to rely on charset and collation, but these columns are empty. For a similar MySQL example, here is CREATE VIEW:

mysql [localhost:8022] {msandbox} (test) > CREATE VIEW v1 AS SELECT 'acdc';
Query OK, 0 rows affected (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > show create view v1;
+------+---------------------------------------------------------------------------------------------------------------------+----------------------+----------------------+
| View | Create View                                                                                                         | character_set_client | collation_connection |
+------+---------------------------------------------------------------------------------------------------------------------+----------------------+----------------------+
| v1   | CREATE ALGORITHM=UNDEFINED DEFINER=`msandbox`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select 'acdc' AS `acdc` | utf8mb4              | utf8mb4_0900_ai_ci   |
+------+---------------------------------------------------------------------------------------------------------------------+----------------------+----------------------+
1 row in set (0.00 sec)

@eurekaka
Copy link
Contributor

eurekaka commented Dec 4, 2020

This issue is pretty complicated.

First of all, an embarrassed fact is that charset and collation of bindings are not used now actually. The bindings are only used in 2 scenarios:

  1. in Optimize() to bind a plan for a SQL. In this case, we only care about the hints of the binding, specifically, we directly inject the hints into the parsed StmtNode of the query, which has right charset / collation settings hence.
  2. in baseline evolution. In this case, the BindSQL of the new binding is generated by StmtNode::Restore(), which would explicitly add charset / collation for string constants in the query, hence during the verification of the BindSQL, it can be parsed with correct charset / collation then. The only problem here is that the BindSQL may not be able to match any original_sql because the latter may have no explicit charset / collation specification for string parameters.

Second, back to this issue itself, the connection charset is utf8, and the string constants in the query have no explicit charset / collation specification, so the restored BindSQL should have utf8 charset specification, instead of the current utf8mb4. This is because statements_summary does not store the charset / collation for the sample SQL, so baseline capture can only use server default charset / collation, which is utf8mb4.

Third, one solution for this issue is to fix the wrong explicit charset / collation specification in BindSQL, i.e, change utf8mb4 to utf8 for this case, and the charset / collation of the bindings can stay unused, since they only impact baseline evolution, which is not a complete feature yet.

Fourth, a better / complete solution would be storing the correct charset / collation in bindings, and use them in baseline evolution. That requires more code changes, e.g, Restore should be able to skip generating explicit charset / collation specification if required.

All in all, this issue would at least not effect the correctness of SQL bind usage, it is more related with baseline evolution. I am going to mark it as low priority.

@wjhuang2016
Copy link
Member

Maybe we can add an option to Restore() to decide to restore the charset and collation or not.

@yudongusa
Copy link

@eurekaka would you please give an updated summary on this given all the enhancement after this issue was originally opened?

@Reminiscent
Copy link
Contributor

/assign

@Reminiscent
Copy link
Contributor

After this PR.The SPM will not be affected by charset.
In the lastest master, we can see

mysql> show global bindings;
+------------------------------------------------------+-------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
| Original_sql                                         | Bind_sql                                                                                              | Default_db | Status | Create_time             | Update_time             | Charset | Collation | Source  |
+------------------------------------------------------+-------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
| update `test` . `t` set `name` = ? where `name` <= ? | UPDATE /*+ use_index(@`upd_1` `test`.`t` `idx`)*/ `test`.`t` SET `name`='hello' WHERE `name` <= 'abc' | test       | using  | 2021-09-03 11:24:14.567 | 2021-09-03 11:24:14.567 |         |           | capture |
+------------------------------------------------------+-------------------------------------------------------------------------------------------------------+------------+--------+-------------------------+-------------------------+---------+-----------+---------+
1 row in set (0.01 sec)

@Reminiscent
Copy link
Contributor

/close

@ti-chi-bot
Copy link
Member

@Reminiscent: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

github-actions bot commented Sep 3, 2021

Please check whether the issue should be labeled with 'affects-x.y' or 'backport-x.y.z',
and then remove 'needs-more-info' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants