Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-ASCII / Chinese table name #733

Closed
ghost opened this issue Jul 9, 2021 · 6 comments
Closed

non-ASCII / Chinese table name #733

ghost opened this issue Jul 9, 2021 · 6 comments
Labels
VERIFIED verified issue
Milestone

Comments

@ghost
Copy link

ghost commented Jul 9, 2021

2021-07-05T17:50:39.414+0800 [WARN] client.driver_mgr.dtle: ValidateOriginalTable error: driver=dtle table=tbl_order——_copy @module=dtle.extractor err="Error 1146: Table 'test.tbl_order??_copy' doesn't exist" job=testenv0 schema=test timestamp=2021-07-05T17:50:39.413+0800
@ghost ghost added this to the next milestone Jul 9, 2021
@ghost ghost mentioned this issue Feb 23, 2022
ghost pushed a commit that referenced this issue Mar 17, 2022
for range on string iterates by rune
@ghost
Copy link
Author

ghost commented Mar 17, 2022

golang中 for range stringVar 的坑:

  • 按照unicode编码点来枚举, 而非byte
s := "a中文"
for i, v := range s {
    stringVar[i] // byte
    v // rune
}

@ghost ghost closed this as completed Mar 17, 2022
@ghost ghost modified the milestones: next, 4.22.03.0 Mar 17, 2022
@ghost ghost reopened this Mar 17, 2022
@ghost
Copy link
Author

ghost commented Mar 17, 2022

2 增量过程中以非utf8执行DDL, 含non-ASCII表名

    2022-03-17T18:24:16.533+0800 [DEBUG] client.driver_mgr.dtle: query event: driver=dtle query="create table `s����`.`t4����` (id int)" schema= @module=reader job=a1-migration timestamp=2022-03-17T18:24:16.532+0800
    2022-03-17T18:24:16.533+0800 [INFO]  client.driver_mgr.dtle: Skip QueryEvent: driver=dtle currentSchema= gno=115618 job=a1-migration sql="create table `s����`.`t4����` (id int)" tableName=t4ÖÐÎÄ @module=reader realSchema=sÖÐÎÄ timestamp=2022-03-17T18:24:16.533+0800

问题: job.hcl使用UTF8. dtle不认为乱码库表名属于复制范围.

@ghost
Copy link
Author

ghost commented Mar 21, 2022

构造一个混合encoding的DDL并执行

echo 'set names gbk;' > create.sql
echo 'create table s中文.tdef (id中文 int primary key auto_increment,' | iconv -t gbk >> create.sql
echo ' val varchar(50) default _utf8mb4"aa中文");' >> create.sql

mysql -h ... < create.sql

binlog QueryEvent中以原始语句记录. QueryEvent.StatusVarsdoc, character_set_client=28, 即gbk.

@ghost
Copy link
Author

ghost commented Mar 22, 2022

以gbk DDL为例:

方案1: 全面转换为UTF8, 以UTF8执行

  • 保持introducer字符串如_sjis"あああ"为原始编码
  • 转换AST中库/表/列名和comment等

方案2: 将库表列名转换为UTF8进行复制范围比较. 保留原文, 以gbk执行.

  • sqle 需要UTF8的AST
  • 注意Rename要转换回gbk

@ghost
Copy link
Author

ghost commented Mar 24, 2022

综合考虑, 决定使用方案1, 并作如下限制:

  • 对于非UTF8(MB4)的DDL, 如GBK, 不支持DDL中使用encoding introducer (如_sjis"あああ").
    • 如果使用, 会产生乱码/数据不一致.

ghost pushed a commit that referenced this issue Mar 24, 2022
limitation: do not use hybrid encoded ddl. e.g

create table gbk_name (val varchar(50) default _sjis"sjis_value");
@ghost ghost closed this as completed Mar 28, 2022
@asiroliu
Copy link
Collaborator

asiroliu commented Mar 29, 2022

增加测试用例覆盖范围,在已有的ddl测试用例上增加中文字符用例,包含一下范围:

  • 库名
  • 表名
  • 列名
  • 索引名
  • rename 中文表名
  • truncate 中文表名
  • VIEW
  • FUNCTION
  • PROCEDURE
  • EVENT
  • TRIGGER

@asiroliu asiroliu added the VERIFIED verified issue label Mar 29, 2022
ghost pushed a commit that referenced this issue Aug 1, 2022
and take all unrecognized character set as binary.
ghost pushed a commit that referenced this issue Aug 2, 2022
and take all unrecognized character set as binary.
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VERIFIED verified issue
Projects
None yet
Development

No branches or pull requests

1 participant