Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

查询unicode编码长度为四字节的汉字时返回结果异常 #1541

Open
siuze opened this issue May 24, 2023 · 0 comments
Open

查询unicode编码长度为四字节的汉字时返回结果异常 #1541

siuze opened this issue May 24, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@siuze
Copy link
Contributor

siuze commented May 24, 2023

问题描述
在查询unicode编码长度为四字节的生僻字如“𠚺“,均会返回大量与输入内容无关的结果:
image

输入小于四字节的汉字则无问题,如“䶶”,返回结果正常为空:
image

如何重现
在榕典内输入任一unicode四字节编码长度汉字,点击查询。
如:https://www.ydict.net/search/%F0%A0%9A%BA

预期表现
只返回包含“𠚺”字的条目。

其他信息
该bug已经存在了很久很久没修,我发现在mysql上建库查询也有类似的错误结果,下面的解决方法供参考:

问题复现:
d9ee9682391592d723bb9b4ba994b66

解决办法:修改数据库的排序规则为utf8mb4_bin
c82d08c8585a29d32c1fb2667c71b2f

问题解决:
95b028dbecd0659a11dc45a710a310e

之前用general_ci和unicode_ci规则都会有问题

@siuze siuze added the bug Something isn't working label May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants