Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql: use UTF8mb4 as consistent connection charset #9558

Merged
merged 9 commits into from
Jan 27, 2022

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Jan 25, 2022

Description

OK, after a lot of testing I have what I believe is a working solution for the weird collation behavior we've been seeing in Vitess clusters. What I figured out is that the workaround that we implemented in our MySQL connection code to support collations >255 was not really working. As a reminder, when connecting to a MySQL server, a handshake is performed between client and server to agree on a "connection charset" (as defined in the MySQL documentation). Now, two critical facts about this: the wire protocol only has 1 byte for this charset, so it's not possible to use charsets higher than 255; and what the documentation calls "charset" is in fact the ID of a full collation, that can very well go above 255.

This PR hence does the following things:

  1. It removes the db_collation flag that @frouioui and I introduced a couple months ago. Since this flag never made it to any Vitess releases, there's no deprecation notice needed. As a result of this investigation, we now understand that setting a "charset" and a "collation" for a connection is redundant, as the charset is actually a collation ID. The workaround we had to enforce a custom collation, which was doing it via SQL once the connection was open, is not giving reliable results: we were performing a SET collation_connection call, but the semantics of this change are weird and it appears to only apply to some string comparisons in SQL. It's not a wholistic change like what we'd get by passing a charset via the handshake when connecting.

  2. It adds support for passing collation names to the db_charset flag as long as those collations have a collation ID <=255, so that it fits in the handshake packet. This is a non-breaking change that matches the behavior of the commandline flags to mysql (the MySQL CLI client). You can pass a charset name like we were doing before, such as utf8mb4, but you can also pass a full collation like utf8m4_general_ci. Vitess will error out if you attempt to pass a collation that resolves to an ID > 255. Fortunately, all the collation IDs for the default collations for all charsets in all MySQL versions are <=255.

  3. It reliably changes the default charset used for Vitess clusters based on the underlying MySQL server version. Vitess clusters based on MySQL 5.x will use utf8mb4_general_ci by default (unless configured otherwise with the db_charset flag). Vitess clusters based on MySQL 8+ will use utf8mb4_0900_ai_ci. To note: utf8mb4_0900_ai_ci is the default charset for a MySQL 8.0 deployment, and Vitess will not match that behavior. However, utf8mb4_general_ci is not the default charset for a MySQL 5.7 deployment: that'd be utf8_general_ci. We believe this is a terrible default, particularly since utf8 is a deprecated charset in MySQL, so we want to enforce all users to use utf8mb4 charsets even in 5.7 to prevent them from shooting themselves in the foot.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Vicent Marti <vmg@strn.cat>
go/mysql/server.go Outdated Show resolved Hide resolved
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg
Copy link
Collaborator Author

vmg commented Jan 26, 2022

edit: moved to the PR body

Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg
Copy link
Collaborator Author

vmg commented Jan 26, 2022

@frouioui green and ready for re-review

@deepthi deepthi removed their request for review January 27, 2022 02:27
Copy link
Member

@frouioui frouioui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Looking good to me 🚀

go/mysql/collations/env.go Show resolved Hide resolved
go/mysql/collations/env.go Show resolved Hide resolved
Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg vmg merged commit af3f66d into vitessio:main Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants