Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL/VReplication: support non-UTF8 character sets #8322

Merged

Conversation

shlomi-noach
Copy link
Contributor

Description

Supporting non UTF8 character sets in OnlineDDL/VReplication:

  • Table can be created in any character set
  • Online DDL supports copying table data even if character set is non utf8
  • Supports converting from utf8 to non-utf8
  • Supports converting from non-utf8 to utf8
  • Supports oncverting from non-utf8 to non-utf8 (e.g. latin1 to latin2)
  • Limited set of character sets supports (but the major ones are supported)

Implementation:

  • VReplication's filter->rule now has a new field, CharsetConversion, which tells Vreplication:

    • for every textual column
    • which has seen a change in character set
    • what was the original charset and what is the new charset
      VReplication plan will programmatically convert binlog text values according to the above.
  • VCopier adds a convert(<column> using utf8mb4) to any textual column where charset/collation has changed

  • VStreamer identifies the convert() function and similarly runs a convert(... using utf8mb4) on same column as VCopier hinted, so that we convert the columns on the source side

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Notes

Submitting code changes with a lot of commented code and with debug info, as draft pull request. Will clean it up before making it ready for review.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Build/CI Component: VReplication Type: CI/Build labels Jun 13, 2021
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…r/plan ; ConvertExpr not needed

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach marked this pull request as ready for review June 14, 2021 07:14
@shlomi-noach
Copy link
Contributor Author

Ready for review!

Notable:

  • online DDL instructs vstreamer to read all textual columns with convert(? using utf8mb4)
  • online DDL tells vreplication/rule whether a column has a from-to charset conversion
  • binary logs text values are always in utf8, so this lines up both
  • replication plan examines textual column and sees if there's a charset conversion, and decodes programmatically as needed

Tests are good and present multiple interesting cases (from/to utf8, from/to latin1, latin1-to-latin2, nullable, ...)

@shlomi-noach
Copy link
Contributor Author

Tests added in this PR:

  • alter-charset-non-utf8-80
  • alter-charset-non-utf8
  • convert-utf8mb4

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Resolved conflicts having merged #8275, and cleaned up the code a bit.

Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Build/CI Component: VReplication Type: CI/Build Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants