Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary column corruption ruby tests 157 #159

Conversation

kolbitsch-lastline
Copy link

@kolbitsch-lastline kolbitsch-lastline commented Mar 11, 2020

Link: #157

@shuhaowu
Copy link
Contributor

I see you made some changes into vendor/. Upstream is usually pretty receptive to PRs. Perhaps we can send a PR there?

@kolbitsch-lastline
Copy link
Author

I see you made some changes into vendor/. Upstream is usually pretty receptive to PRs. Perhaps we can send a PR there?

yes, see my comment in the git commit message.

the point of the pull-request is to get your feedback on the change... if you disagree this is how we want to implement this, doing a PR to the upstream is a waste of time :-) . But yes, if you agree we want to change it as I propose, we clearly need to send the vendor'ized changes to upstream


ghostferry = new_ghostferry(MINIMAL_GHOSTFERRY)

ghostferry.on_status(Ghostferry::Status::ROW_COPY_COMPLETED) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use BINLOG_STREAMING_STARTED instead? Should be less racy that way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I actually think I tried that. Maybe I misunderstood what's the sequence of events.

When is binlog streaming started in this scenario - after the copy is complete or before that? I thought the former. Thus, I really want the test to happen after we are copying, because the bug occurs only during binlog streaming, not during the copy (as far as I can tell)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binlog streaming and data copy begin and occur simultaneously.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the ROW_COPY_COMPLETED event is sent, the binlog streamer is about to be terminated. I don't recall 100%, but I think there's a strong race possibility that the binlog is not picked up by Ghostferry.

When these events are sent and these ruby blocks are called, the Ghostferry code is blocked from proceeding until the ruby block returns. As such, a more fool-proof test would be something like this:

source_db.query("INSERT INTO ... (binary data...)")
row_copy_called = false
binlog_appled_called = false
ghostferry.on_status(Ghostferry::Status::AFTER_ROW_COPY) do
  row_copy_called = true
  # select row from the target and then make sure the data with 0 padding is present
  # Perform the update query to set the data to `updated_data`.
end

ghostferry.on_status(Ghostferry::Status::AFTER_BINLOG_APPLY) do
  binlog_apply_called = true
  # select row from the target and make sure the data is on the updated_data
end

# run ghostferry,
# assert table identical
assert binlog_apply_called
assert row_copy_called
# if we want to be defensive, query the target and make sure the data is updated_data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your suggestion, but am running into two issues with this:

first, updating the source-DB from the context of AFTER_ROW_COPY fails. Do we hold some type of lock at the time this is invoked?

second, when checking in AFTER_BINLOG_APPLY, it seems binlog applying isn't fully done yet. Thus, it keep seeing the old data and the validation fails

I understand your proposed code, but seems that's equally racy or something else is off

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you're right. Sorry for the faulty suggestion. In AFTER_BINLOG_APPLY there's indeed a lock due to SELECT FOR UPDATE. I misremembered when I made that suggestion. For now using BINLOG_STREAMING_STARTED should work slightly better. At some point in the future we can make the "breakpoints" a little bit more robust and less racy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for the details. No problem for the misdirection - glad I wasn't doing anything wrong

to avoid a misunderstanding: so is there anything else that we should fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than using BINLOG_STREAMING_STARTED, not now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then there seems to have been a misunderstanding - that's why I was asking if streaming was started in parallel with the row-copy above (and you answered they are started in parallel...)

if we use BINLOG_STREAMING_STARTED , we race in the unit-test (and actually usually lose the race), meaning that callback is called before the row-copy has inserted the original data in the target.

@shuhaowu
Copy link
Contributor

Let me know If I understand this issue correctly:

  1. if there's a BINARY(N) column and if you insert a some data where its length is < N, mysql will automatically pad the data with \0.
  2. Within the DataIterator: Ghostferry retrieves the binary column, it will return the data with trailing '\0'. Thus, data copied by Ghostferry will remain unaffected.
  3. Within the the BinlogStreamer: We use siddontang/go-mysql's replication module, which strips the \0 out. This implies that the data in the mysql binlog stream is fine, just there's a bug in the upstream module.
  4. The data with the stripped \0 reaches Ghostferry and we use it in the WHERE clause for UPDATE and DELETE, which matches no rows and thus causes silent data corruption within the BinlogStreamer only.

Thus, the fix here is:

  • If we detect a BINARY(N) column and we get data that's less than N, we right pad the data with \0 until the length of the data = N.
  • The patch made to the upstream library is to change the schema definition to add the BINARY and VARBINARY types. It's not to make it so the upstream library returns the correct data to ghostferry.

Questions:

  • What about VARBINARY? Do we need anymore handling for that type other than this patch? I'm not very familiar with this data type. The MySQL doc says "For VARBINARY, there is no padding for inserts and no bytes are stripped for retrievals", but I'm not certain how applicable this is here.

Additional comments:

  • You may have noted that CI has been failing. This maybe due to one of the race conditions within tests outlined in Race condition in tests causing intermittent failures #107. Basically the only advice I have is to retry the tests until the pass if you know the failing test is caused by one of the documented issues. Fixing them may also be easy and is also very welcome (in a separate PR).

@kolbitsch-lastline
Copy link
Author

if there's a BINARY(N) column and if you insert a some data where its length is < N, mysql will automatically pad the data with \0.

correct:

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test;
Database changed

mysql> CREATE TABLE test (id bigint(20) not null auto_increment, data BINARY(4), primary key(id));
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO test (id, data) values (1, 'AB');
Query OK, 1 row affected (0.00 sec)

mysql> select id, data, hex(data) from test where id = 1;
+----+------+-----------+
| id | data | hex(data) |
+----+------+-----------+
|  1 | AB   | 41420000  |
+----+------+-----------+
1 row in set (0.00 sec)

as you can see, I inserted AB but the DB stores AB\x00\x00. The binlog will contain AB though.

same can be seen if I do a simple query later on:

mysql> select id, data, hex(data) from test where id = 1 and data = 'AB';
Empty set (0.00 sec)

As you can see, I cannot SELECT the data that I INSERTed before. Selecting on the padded string, however, works:

mysql> select id, data, hex(data) from test where id = 1 and hex(data) = '41420000';
+----+------+-----------+
| id | data | hex(data) |
+----+------+-----------+
|  1 | AB   | 41420000  |
+----+------+-----------+
1 row in set (0.00 sec)

(I'm using the hex stuff to avoid having to print the \x00 here)

Within the DataIterator: Ghostferry retrieves the binary column, it will return the data with trailing '\0'. Thus, data copied by Ghostferry will remain unaffected.

I have not experimented yet with the copy, but that's my next step. But, yes, I agree it should not be affected

Within the the BinlogStreamer: We use siddontang/go-mysql's replication module, which strips the \0 out. This implies that the data in the mysql binlog stream is fine, just there's a bug in the upstream module.

not quite: it does not strip it, it never ends up in the binlog. So, it's the mysql server's "fault", not the replication module's.

Essentially ghostferry receives what the user inserted, but not the 0-padding added by mysql on the insert.

NOTE: I was not able to find this behavior in the documentation. I'm inferring this from experiments, but I have done a lot of it, so I'm fairly confident what I'm writing here reflects reality. But there could be a different cause for all this that matches the same symptoms.

The data with the stripped \0 reaches Ghostferry and we use it in the WHERE clause for UPDATE and DELETE, which matches no rows and thus causes silent data corruption within the BinlogStreamer only.

correct.

Thus, the fix here is:

If we detect a BINARY(N) column and we get data that's less than N, we right pad the data with \0 until the length of the data = N.

correct.

The patch made to the upstream library is to change the schema definition to add the BINARY and VARBINARY types. It's not to make it so the upstream library returns the correct data to ghostferry.

correct.

Questions:

What about VARBINARY? Do we need anymore handling for that type other than this patch? I'm not very familiar with this data type. The MySQL doc says "For VARBINARY, there is no padding for inserts and no bytes are stripped for retrievals", but I'm not certain how applicable this is here.

I believe VAR* types are not affected, as there is no padding. IMO it should only happen for BINARY and CHAR - that is, any column that supports fixed size and where the actual length in the column is not stored as separate attribute on the row-column

Additional comments:

You may have noted that CI has been failing. This maybe due to one of the race conditions within tests outlined in #107. Basically the only advice I have is to retry the tests until the pass if you know the failing test is caused by one of the documented issues.

oh, thanks for pointing this out! I was already puzzled by this. My local tests seem to work fine. So, I can confirm it works. Is there a way to retrigger QA without updating the PR?

Fixing them may also be easy and is also very welcome (in a separate PR).

hehe, I think I'll prioritize fixing the other issues I have brought up :-D But, yes, if I happen to see what is causing this, happy to send a PR ;-)

@shuhaowu
Copy link
Contributor

oh, thanks for pointing this out! I was already puzzled by this. My local tests seem to work fine. So, I can confirm it works. Is there a way to retrigger QA without updating the PR?

There should be a rebuild button on CI. You have to be logged into travis via your Github account tho. You might be able to just click rerun on the GH interface as well, but I've never tried.

@kolbitsch-lastline
Copy link
Author

I'm logged in, but I can't retrigger the build - I think it's a permission issue. According to the post here:

https://stackoverflow.com/questions/17606874/trigger-a-travis-ci-rebuild-without-pushing-a-commit

this is only possible if the account has write permissions, which I don't. Do you mind retriggering it? Assuming these are perma-links , you should be able to retrigger it here:

https://travis-ci.com/github/Shopify/ghostferry/builds/152736013

(for me, there is no `restart build" button as shown on the stackoverflow post)

@shuhaowu
Copy link
Contributor

shuhaowu commented Mar 11, 2020

Triggered. Sorry about the messy state of CI. You can also push some empty commits to retrigger builds.

Back to the topic at hand, I think you're right about this being a mysql "bug". I just did my own tests:

mysql> create table a (d BINARY(5));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into a values (_binary'a');
Query OK, 1 row affected (0.01 sec)

mysql> select hex(d) from a;
+------------+
| hex(d)     |
+------------+
| 6100000000 |
+------------+
1 row in set (0.00 sec)

Using mysqlbinlog -vvv, I see:

# at 540
#200311 19:58:14 server id 1  end_log_pos 611 CRC32 0xd85350b1  Query   thread_id=3     exec_time=0     error_code=0
SET TIMESTAMP=1583956694/*!*/;
BEGIN
/*!*/;
# at 611
#200311 19:58:14 server id 1  end_log_pos 656 CRC32 0x742aeeee  Table_map: `abc`.`a` mapped to number 108
# at 656
#200311 19:58:14 server id 1  end_log_pos 694 CRC32 0xdd1079b9  Write_rows: table id 108 flags: STMT_END_F

BINLOG '
1kJpXhMBAAAALQAAAJACAAAAAGwAAAAAAAEAA2FiYwABYQAB/gL+BQHu7ip0
1kJpXh4BAAAAJgAAALYCAAAAAGwAAAAAAAEAAgAB//4BYbl5EN0=
'/*!*/;
### INSERT INTO `abc`.`a`
### SET
###   @1='a' /* STRING(5) meta=65029 nullable=1 is_null=0 */
# at 694
#200311 19:58:14 server id 1  end_log_pos 725 CRC32 0xfb77b9e0  Xid = 15
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file

Specifically, the line says: ### @1='a' /* STRING(5) meta=65029 nullable=1 is_null=0 */ suggests to me that the MySQL binlog doesn't contain the \0 at the end. Maybe mysqlbinlog is stripping the /0, we'd have to confirm with mysql source code I'm afraid...

@shuhaowu
Copy link
Contributor

shuhaowu commented Mar 11, 2020

Since you're looking a little more closely at go-mysql, here are some links of how some other code are handling BINARY in replication. They also remind me that this code may affect CHAR.

Edit: doesn't seem to affect CHAR:

mysql> create table b (d char(5));
Query OK, 0 rows affected (0.03 sec)

mysql> insert into b values ('a');
Query OK, 1 row affected (0.00 sec)

mysql> select hex(d) from b;
+--------+
| hex(d) |
+--------+
| 61     |
+--------+
1 row in set (0.00 sec)


@kolbitsch-lastline
Copy link
Author

oh, interesting. Essentially the java-equivalent of what I suggested.

I think we agree on how to move forward then, correct? I'll go and try to get the patch into the upstream of go-mysql so we can use it here...

@shuhaowu
Copy link
Contributor

After some further investigation on related projects, it seems that others are running into the same issue. Specifically, shyiko/mysql-binlog-connector-java#169 is like siddontang/go-mysql, which is unable to fix this issue it seems. Debezium is a consumer of that library much like Ghostferry is, and they documented the issue here: https://issues.redhat.com/browse/DBZ-254. And their solution sounds very similar to yours and is implemented at debezium/debezium#237. I'm going to review that PR to see if we're doing anything differently here.

@kolbitsch-lastline
Copy link
Author

haha, I can't believe how similar the examples between

https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html

and my post above are - what a coincidence :-)

But, glad to see that we have a concrete page to refer to . Working on a PR to go-mysql now

@shuhaowu
Copy link
Contributor

Just reviewed the Debezium PR. For future reference, Debezium's fix is here: https://github.com/debezium/debezium/blob/2c56997/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/MySqlValueConverters.java#L611-L614

This approach is exactly the same as the ones proposed here.

Copy link
Contributor

@shuhaowu shuhaowu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me. We need to merge the upstream changes but idk if we are ready to update go-mysql, so maybe a vendor patch here is fine.

dml_events.go Outdated
@@ -406,17 +410,21 @@ func Int64Value(value interface{}) (int64, bool) {
//
// ref: https://github.com/mysql/mysql-server/blob/mysql-5.7.5/mysys/charset.c#L963-L1038
// ref: https://github.com/go-sql-driver/mysql/blob/9181e3a86a19bacd63e68d43ae8b7b36320d8092/utils.go#L717-L758
func appendEscapedString(buffer []byte, value string) []byte {
func appendEscapedString(buffer []byte, value string, padToLength uint) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to rightPadLengthForBinaryColumn to make it more verbose? Can you also leave some comments in the code to link to this PR so people in the future can figure out why this is done?

dml_events.go Outdated
c := value[i]
if c == '\'' {
buffer = append(buffer, '\'', '\'')
} else {
buffer = append(buffer, c)
}
}
for ; i < int(padToLength); i++ {
buffer = append(buffer, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Appending 0 is OK here? Don't need something like \0? I ask because this buffer is eventually going to be turned into a string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffer is a []byte . As such, 0 seems the right thing to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but buffer eventually gets turned into a string, see https://github.com/Shopify/ghostferry/blob/e3f52d4/dml_events.go#L286. I'm not certain if that's the right encoding for the query as I'd assume MySQL wants you to specify \0 in a statement, especially since we're not using prepared statements here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I saw your comment too late and I just pushed the new patch-set. Let me check a bit more - I thought the binary conversion would end up being the same thing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admittedly: I'm coming from a c/c++ and Python background, so I don't know if it really makes a difference

in golang, 0 is a int while '\x00' is a int32 but they should be the same value when encoding. But I have no problem using '\x00' if that's cleaner

dml_events.go Outdated
c := value[i]
if c == '\'' {
buffer = append(buffer, '\'', '\'')
} else {
buffer = append(buffer, c)
}
}
for ; i < int(padToLength); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit clever such that it might confuse future readers, but may be OK.


ghostferry = new_ghostferry(MINIMAL_GHOSTFERRY)

ghostferry.on_status(Ghostferry::Status::ROW_COPY_COMPLETED) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the ROW_COPY_COMPLETED event is sent, the binlog streamer is about to be terminated. I don't recall 100%, but I think there's a strong race possibility that the binlog is not picked up by Ghostferry.

When these events are sent and these ruby blocks are called, the Ghostferry code is blocked from proceeding until the ruby block returns. As such, a more fool-proof test would be something like this:

source_db.query("INSERT INTO ... (binary data...)")
row_copy_called = false
binlog_appled_called = false
ghostferry.on_status(Ghostferry::Status::AFTER_ROW_COPY) do
  row_copy_called = true
  # select row from the target and then make sure the data with 0 padding is present
  # Perform the update query to set the data to `updated_data`.
end

ghostferry.on_status(Ghostferry::Status::AFTER_BINLOG_APPLY) do
  binlog_apply_called = true
  # select row from the target and make sure the data is on the updated_data
end

# run ghostferry,
# assert table identical
assert binlog_apply_called
assert row_copy_called
# if we want to be defensive, query the target and make sure the data is updated_data.

@kolbitsch-lastline
Copy link
Author

The approach looks good to me.

cool, working on a final PS

We need to merge the upstream changes but idk if we are ready to update go-mysql, so maybe a vendor patch here is fine.

ok, then let's start with a vendor-patch and - in parallel - work on getting our changes into upstream

@kolbitsch-lastline kolbitsch-lastline force-pushed the binary-column-corruption-ruby-tests-157 branch 2 times, most recently from 1a9ea62 to e7fb1ba Compare March 11, 2020 22:28
inserted_data = "ABC\x00"
updated_data = "EFGH"

source_db.query("INSERT INTO #{DEFAULT_FULL_TABLE_NAME} (id, data) VALUES (1, _binary'#{inserted_data}')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be inserting a value that is shorter than the column definition(eg: "ABC") and let mysql add the trailing 0s to reproduce the exact same issue reported here ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interestingly mysql seems to even strip the trailing whitespace if done like this. But your suggestion is perfectly valid and maybe a slightly better test-case

@kolbitsch-lastline kolbitsch-lastline force-pushed the binary-column-corruption-ruby-tests-157 branch from e7fb1ba to 825bd63 Compare March 12, 2020 18:57
Copy link
Contributor

@fjordan fjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! A few comments

# update/delete statements, as the WHERE clause would not match existing
# rows in the target DB
inserted_data = "ABC"
expected_inserted_data = inserted_data + "\x00"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick, but in ruby this is preferably expressed as:

Suggested change
expected_inserted_data = inserted_data + "\x00"
expected_inserted_data = "#{inserted_data}\x00"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack


[source_db, target_db].each do |db|
db.query("CREATE DATABASE IF NOT EXISTS #{DEFAULT_DB}")
db.query("CREATE TABLE IF NOT EXISTS #{DEFAULT_FULL_TABLE_NAME} (id bigint(20) not null auto_increment, data BINARY(4), primary key(id))")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for a case when the length is 0 (ex: create table a (d binary);) and for the decimal type above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for a case when the length is 0 (ex: create table a (d binary);)

sure

and for the decimal type above?

Just to avoid an unnecessary round-trip: what do you mean by that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the decimal.Decimal case, the function is called with a length of 0 - appendEscapedString(buffer, v.String(), 0). I meant a test case of length 0 would cover both the decimal.Decimal case as well as if the length of the column was 0 (ie: create table a (d binary);).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. I thought we already had a test-case for decimals, but I'll check.

dml_events.go Outdated
Comment on lines 415 to 429
//
// We need to support right-padding of the generated string using 0-bytes to
// mimic what a MySQL server would do for BINARY columns (with fixed length).
//
// ref: https://github.com/Shopify/ghostferry/pull/159
//
// This is specifically mentioned in the the below link:
//
// When BINARY values are stored, they are right-padded with the pad value
// to the specified length. The pad value is 0x00 (the zero byte). Values
// are right-padded with 0x00 for inserts, and no trailing bytes are removed
// for retrievals.
//
// ref: https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html
func appendEscapedString(buffer []byte, value string, rightPadLengthForBinaryColumn uint) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving the 0-byte padding into a separate function would be cleaner, as the padding doesn't really have anything to do with properly escaping strings. That way, we only need to call it if column.Type == schema.TYPE_BINARY and don't need to modify the case of decimal.Decimal

Reusing the naming, we can make a function like:

 func rightPadLengthForBinaryColumn(buffer []byte, columnLength uint) []byte {
    ...
}

and just move the logic there. It can be called directly after appendEscapedString in line 357.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true - seems cleaner and avoids the other invocation of the method with the awkward 0 parameter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, let me take that back. I agree it's cleaner, but the method currently does the quoting (as well as escaping the ' ) and knows about the length of the string

sure, all of that can be broken apart, but does that make it more readable?

Copy link
Contributor

@fjordan fjordan Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I think handling the trailing '\'' that is appended to the buffer in a separate method would make this less readable.

Another option would be to pass the column to the appendEscapedString method and conditionally handle the padding in that method:

// We need to support right-padding of the generated string using 0-bytes to
// mimic what a MySQL server would do for BINARY columns (with fixed length).
//
// ref: https://github.com/Shopify/ghostferry/pull/159
//
// This is specifically mentioned in the the below link:
//
//    When BINARY values are stored, they are right-padded with the pad value
//    to the specified length. The pad value is 0x00 (the zero byte). Values
//    are right-padded with 0x00 for inserts, and no trailing bytes are removed
//    for retrievals.
//
// ref: https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html
func rightPadLengthForBinaryColumn(buffer []byte, padLength int) []byte {
	// continue 0-padding up to the desired length as provided by the
	// caller
	for i := 0; i < int(padLength); i++ {
		buffer = append(buffer, '\x00')
	}

	return buffer
}

// appendEscapedString replaces single quotes with quote-escaped single quotes.
// When the NO_BACKSLASH_ESCAPES mode is on, this is the extent of escaping
// necessary for strings.
//
// ref: https://github.com/mysql/mysql-server/blob/mysql-5.7.5/mysys/charset.c#L963-L1038
// ref: https://github.com/go-sql-driver/mysql/blob/9181e3a86a19bacd63e68d43ae8b7b36320d8092/utils.go#L717-L758
func appendEscapedString(buffer []byte, value string, column schema.TableColumn) []byte {
	buffer = append(buffer, '\'')

	for i := 0; i < len(value); i++ {
		c := value[i]
		if c == '\'' {
			buffer = append(buffer, '\'', '\'')
		} else {
			buffer = append(buffer, c)
		}
	}

	if column.Type == schema.TYPE_BINARY {
		padLength := int(column.FixedSize) - len(value)
		buffer = rightPadLengthForBinaryColumn(buffer, padLength)
	}

	return append(buffer, '\'')
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO appendEscapedString is a pure string/conversion method and should not have to worry about columns or types. I have refactored out the trivial method for 0-padding now, but I feel making the former method aware of columns or types seems going the wrong direction (making the method less useful in other scenarios)

it's a nit and I'm fine either way, but I feel passing the column is wrong here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly - just that it seemed a bit awkward to be forcing a param for all cases when not all applied. Either way can work 👍

@kolbitsch-lastline kolbitsch-lastline force-pushed the binary-column-corruption-ruby-tests-157 branch from 825bd63 to e64ef4e Compare March 12, 2020 23:21
dml_events.go Outdated
// continue 0-padding up to the desired length as provided by the
// caller
for ; i < int(rightPadLengthForBinaryColumn); i++ {
buffer = append(buffer, '\x00')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say that we're supposed to generate MySQL queries here. Is having a \0valid in a query? Or should it be "\0"? The difference is that the latter is the literal ascii character '\' and '0' while the former is actually just 0, if that makes sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, I agree :-) My original code was injecting literal 0 (that is, the byte, not the character), and then I switched it to be \x00 because it was suggested as improvement.

Still, please note that in neither case are we inserting a 2-character string of \0 (that is, a \ and a 0).

The only difference I can see is that one is an int and the other an int32. However: compared to you guys, I'm probably a golang noob and may simply be wrong here.

But, yes, we are building SQL statements, and we want to have literal 0-bytes (not 0 strings or \0 strings) be injected.

@kolbitsch-lastline kolbitsch-lastline force-pushed the binary-column-corruption-ruby-tests-157 branch from e64ef4e to 7c3890d Compare March 13, 2020 16:54
This commit addresses a data corruption bug in the binlog streaming
phase, where the binlog writer incorrectly propagates values in
fixed-length BINARY columns that have trailing 0s in the original value.
These trailing 0s are removed from the binlog by the SQL master and
therefore do not show up in the WHERE clause for update/delete
statements executed by the binlog writer.

NOTE: This commit requires changes to one of the vendor'ed modules in

    github.com/siddontang/go-mysql

that we patch directly in the local repo. We are working on getting
these changes into the upstream module and will need to merge these
changes once we can use the latest upstream module version.

Change-Id: Ib9c1b7308e8198f1fd38439c37f444d9a8154e6a
@kolbitsch-lastline kolbitsch-lastline force-pushed the binary-column-corruption-ruby-tests-157 branch from 7c3890d to 37f2188 Compare March 13, 2020 17:54
@jeremycole jeremycole removed their request for review June 23, 2021 18:05
shuhaowu added a commit that referenced this pull request Sep 21, 2021
This was manually picked from
#159. I made some minor
adjustments to the comments and the function names to be more clear.

Co-authored-by: Clemens Kolbitsch <kolbitsch@lastline.com>
Co-authored-by: Shuhao Wu <shuhao@shuhaowu.com>
shuhaowu added a commit that referenced this pull request Sep 22, 2021
This was manually picked from
#159. I made some minor
adjustments to the comments and the function names to be more clear.

Co-authored-by: Clemens Kolbitsch <kolbitsch@lastline.com>
Co-authored-by: Shuhao Wu <shuhao@shuhaowu.com>
shuhaowu pushed a commit that referenced this pull request Sep 22, 2021
This commit addresses a data corruption bug in the binlog streaming
phase, where the binlog writer incorrectly propagates values in
fixed-length BINARY columns that have trailing 0s in the original value.
These trailing 0s are removed from the binlog by the SQL master and
therefore do not show up in the WHERE clause for update/delete
statements executed by the binlog writer.

This commit is manually cherry-picked from
#159.
@shuhaowu
Copy link
Contributor

This was cherry-picked into a new PR and merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants