Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-mysql: Support collation utf8_general_ci #2056

Closed
willdonnelly opened this issue Oct 15, 2024 · 1 comment · Fixed by #2058
Closed

source-mysql: Support collation utf8_general_ci #2056

willdonnelly opened this issue Oct 15, 2024 · 1 comment · Fixed by #2058
Assignees
Labels
change:unplanned Unplanned change, useful for things like doc updates

Comments

@willdonnelly
Copy link
Member

There's some historical reason stuff going on with that collation name, but for our purposes we don't care and should treat it as UTF-8 text just like we do utf8mb3_foobar and utf8mb4_foobar collations.

In fact that's already what we do, but only because we have a nice default behavior of "if we don't recognize it, assume it's UTF-8 compatible". So really adding this explicitly will just eliminate a benign error log, but that's still worth doing at some point.

@willdonnelly willdonnelly added the change:unplanned Unplanned change, useful for things like doc updates label Oct 15, 2024
@willdonnelly willdonnelly self-assigned this Oct 15, 2024
@willdonnelly
Copy link
Member Author

According to a quick spot check, collations named utf8_foobar are the leading cause of MySQL CDC connector errors in production right now, so fixing this will make production error counts a lot less noisy.

willdonnelly added a commit that referenced this issue Oct 15, 2024
The collation families `utf8_whatever` and `ascii_whatever` are
explicitly added to the charsets table so we stop logging errors
reading `unknown charset for collation, assuming UTF-8` and just
do that without complaining because UTF-8 is in fact correct.

The `replication connected without TLS` log message is downgraded
from WARN to INFO to match the main (non-replication) connection
logging.

Between these fixes, that's all the main sources of errors and
warnings in our production MySQL tasks, so hopefully that will
be much less noisy to keep an eye on in the future.

This fixes #2056
willdonnelly added a commit that referenced this issue Oct 15, 2024
The collation families `utf8_whatever` and `ascii_whatever` are
explicitly added to the charsets table so we stop logging errors
reading `unknown charset for collation, assuming UTF-8` and just
do that without complaining because UTF-8 is in fact correct.

(This requires another minor tweak, because the way we match up a
collation name to a character set is by simple prefix matching,
and `utf8` is a prefix of the `utf8mb3` and `utf8mb4` collations.
But a collation name is actually `<charset>_<comparisonRule>`
with an underscore, so to fix that we just have to do a prefix
match including that trailing underscore. This should still work
the same for all collation names that actually exist.)

The `replication connected without TLS` log message is downgraded
from WARN to INFO to match the main (non-replication) connection
logging.

Between these fixes, that's all the main sources of errors and
warnings in our production MySQL tasks, so hopefully that will
be much less noisy to keep an eye on in the future.

This fixes #2056
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant