Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Vreplication streamer to convert textual columns to UTF8 #8356

Closed

Conversation

shlomi-noach
Copy link
Contributor

Experimental

Description

In #8322 we introduced support for non-UTF8 characters. In #8322 it was the responsibility of the initiator of the stream (online DDL in this case) to analyze which columns had to convert to UTF8. It then needed to signal to the streamer the identities of those columns. This was done by applying CONVERT(col USING utf8mb4). Streamer analyzed the query, found a ConvertUsing expression, and re-applied CONVERT(col USING utf8mb4) in SendQuery.

This PR moves the burden to the streamer. The streamer reads columns from a table. It has direct access to the table. It is in the best position to determine which columns are textual and not UTF8.

In this PR the streamer issues a query on information_schema.columns to determine which columns need to be converted to UTF8. Online DDL doesn't need to hint any more, and we are able to strip out some excessive code.

But there's a problem

This doesn't work yet. Curiously, it is apparently still required that vrepl.go issues a sb.WriteString(fmt.Sprintf("convert(%s using utf8mb4)", escapeName(name))). If I replace that with sb.WriteString(escapeName(name)), then a latin1 text in a latin1 character set gets garbled and deciphered incorrectly.

To elaborate, what vrepl.go does is to generate the Filter/Rule query for vreplication.

But this confuses me. I thought the filter query in VReplication is mostly meaningless. That is, it is not the actual query that runs on the source table. Vstreamer completely rewrites the query after analyzing its expressions. Assuming vstreamer always reads textual columns via convert(col using utf8mb4), I don't see why it matters if the filter query does or does not have convert(col using utf8mb4).

The code as it is today, works. But I do want to clean it up and make the filter query as simple as possible. So I'd like to pursue this issue.

Related Issue(s)

#8322

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

@rohit-nayak-ps showed me how the filter query is used by vcopier in:

_, err = vc.tablePlan.applyBulkInsert(&sqlbuffer, rows, func(sql string) (*sqltypes.Result, error) {

to generate the bulk insert query.

@shlomi-noach
Copy link
Contributor Author

Closing this PR as incorrect and not-required.

@shlomi-noach shlomi-noach deleted the vrepl-streamer-convert-text-utf8 branch June 23, 2021 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant