-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary column corruption ruby tests 157 #159
Binary column corruption ruby tests 157 #159
Conversation
I see you made some changes into |
yes, see my comment in the git commit message. the point of the pull-request is to get your feedback on the change... if you disagree this is how we want to implement this, doing a PR to the upstream is a waste of time :-) . But yes, if you agree we want to change it as I propose, we clearly need to send the vendor'ized changes to upstream |
|
||
ghostferry = new_ghostferry(MINIMAL_GHOSTFERRY) | ||
|
||
ghostferry.on_status(Ghostferry::Status::ROW_COPY_COMPLETED) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use BINLOG_STREAMING_STARTED
instead? Should be less racy that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I actually think I tried that. Maybe I misunderstood what's the sequence of events.
When is binlog streaming started in this scenario - after the copy is complete or before that? I thought the former. Thus, I really want the test to happen after we are copying, because the bug occurs only during binlog streaming, not during the copy (as far as I can tell)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binlog streaming and data copy begin and occur simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the ROW_COPY_COMPLETED
event is sent, the binlog streamer is about to be terminated. I don't recall 100%, but I think there's a strong race possibility that the binlog is not picked up by Ghostferry.
When these events are sent and these ruby blocks are called, the Ghostferry code is blocked from proceeding until the ruby block returns. As such, a more fool-proof test would be something like this:
source_db.query("INSERT INTO ... (binary data...)")
row_copy_called = false
binlog_appled_called = false
ghostferry.on_status(Ghostferry::Status::AFTER_ROW_COPY) do
row_copy_called = true
# select row from the target and then make sure the data with 0 padding is present
# Perform the update query to set the data to `updated_data`.
end
ghostferry.on_status(Ghostferry::Status::AFTER_BINLOG_APPLY) do
binlog_apply_called = true
# select row from the target and make sure the data is on the updated_data
end
# run ghostferry,
# assert table identical
assert binlog_apply_called
assert row_copy_called
# if we want to be defensive, query the target and make sure the data is updated_data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your suggestion, but am running into two issues with this:
first, updating the source-DB from the context of AFTER_ROW_COPY
fails. Do we hold some type of lock at the time this is invoked?
second, when checking in AFTER_BINLOG_APPLY
, it seems binlog applying isn't fully done yet. Thus, it keep seeing the old data and the validation fails
I understand your proposed code, but seems that's equally racy or something else is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah you're right. Sorry for the faulty suggestion. In AFTER_BINLOG_APPLY there's indeed a lock due to SELECT FOR UPDATE. I misremembered when I made that suggestion. For now using BINLOG_STREAMING_STARTED
should work slightly better. At some point in the future we can make the "breakpoints" a little bit more robust and less racy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks for the details. No problem for the misdirection - glad I wasn't doing anything wrong
to avoid a misunderstanding: so is there anything else that we should fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than using BINLOG_STREAMING_STARTED
, not now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then there seems to have been a misunderstanding - that's why I was asking if streaming was started in parallel with the row-copy above (and you answered they are started in parallel...)
if we use BINLOG_STREAMING_STARTED
, we race in the unit-test (and actually usually lose the race), meaning that callback is called before the row-copy has inserted the original data in the target.
Let me know If I understand this issue correctly:
Thus, the fix here is:
Questions:
Additional comments:
|
correct:
as you can see, I inserted same can be seen if I do a simple query later on:
As you can see, I cannot SELECT the data that I INSERTed before. Selecting on the padded string, however, works:
(I'm using the
I have not experimented yet with the copy, but that's my next step. But, yes, I agree it should not be affected
not quite: it does not strip it, it never ends up in the binlog. So, it's the mysql server's "fault", not the replication module's. Essentially ghostferry receives what the user inserted, but not the 0-padding added by mysql on the insert. NOTE: I was not able to find this behavior in the documentation. I'm inferring this from experiments, but I have done a lot of it, so I'm fairly confident what I'm writing here reflects reality. But there could be a different cause for all this that matches the same symptoms.
correct.
correct.
correct.
I believe VAR* types are not affected, as there is no padding. IMO it should only happen for BINARY and CHAR - that is, any column that supports fixed size and where the actual length in the column is not stored as separate attribute on the row-column
oh, thanks for pointing this out! I was already puzzled by this. My local tests seem to work fine. So, I can confirm it works. Is there a way to retrigger QA without updating the PR?
hehe, I think I'll prioritize fixing the other issues I have brought up :-D But, yes, if I happen to see what is causing this, happy to send a PR ;-) |
There should be a rebuild button on CI. You have to be logged into travis via your Github account tho. You might be able to just click rerun on the GH interface as well, but I've never tried. |
I'm logged in, but I can't retrigger the build - I think it's a permission issue. According to the post here: https://stackoverflow.com/questions/17606874/trigger-a-travis-ci-rebuild-without-pushing-a-commit this is only possible if the account has write permissions, which I don't. Do you mind retriggering it? Assuming these are perma-links , you should be able to retrigger it here: https://travis-ci.com/github/Shopify/ghostferry/builds/152736013 (for me, there is no `restart build" button as shown on the stackoverflow post) |
Triggered. Sorry about the messy state of CI. You can also push some empty commits to retrigger builds. Back to the topic at hand, I think you're right about this being a mysql "bug". I just did my own tests:
Using
Specifically, the line says: |
Since you're looking a little more closely at go-mysql, here are some links of how some other code are handling BINARY in replication. Edit: doesn't seem to affect CHAR:
|
oh, interesting. Essentially the java-equivalent of what I suggested. I think we agree on how to move forward then, correct? I'll go and try to get the patch into the upstream of go-mysql so we can use it here... |
After some further investigation on related projects, it seems that others are running into the same issue. Specifically, shyiko/mysql-binlog-connector-java#169 is like siddontang/go-mysql, which is unable to fix this issue it seems. Debezium is a consumer of that library much like Ghostferry is, and they documented the issue here: https://issues.redhat.com/browse/DBZ-254. And their solution sounds very similar to yours and is implemented at debezium/debezium#237. I'm going to review that PR to see if we're doing anything differently here. |
haha, I can't believe how similar the examples between https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html and my post above are - what a coincidence :-) But, glad to see that we have a concrete page to refer to . Working on a PR to go-mysql now |
Just reviewed the Debezium PR. For future reference, Debezium's fix is here: https://github.com/debezium/debezium/blob/2c56997/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/MySqlValueConverters.java#L611-L614 This approach is exactly the same as the ones proposed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. We need to merge the upstream changes but idk if we are ready to update go-mysql, so maybe a vendor patch here is fine.
dml_events.go
Outdated
@@ -406,17 +410,21 @@ func Int64Value(value interface{}) (int64, bool) { | |||
// | |||
// ref: https://github.com/mysql/mysql-server/blob/mysql-5.7.5/mysys/charset.c#L963-L1038 | |||
// ref: https://github.com/go-sql-driver/mysql/blob/9181e3a86a19bacd63e68d43ae8b7b36320d8092/utils.go#L717-L758 | |||
func appendEscapedString(buffer []byte, value string) []byte { | |||
func appendEscapedString(buffer []byte, value string, padToLength uint) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this to rightPadLengthForBinaryColumn
to make it more verbose? Can you also leave some comments in the code to link to this PR so people in the future can figure out why this is done?
dml_events.go
Outdated
c := value[i] | ||
if c == '\'' { | ||
buffer = append(buffer, '\'', '\'') | ||
} else { | ||
buffer = append(buffer, c) | ||
} | ||
} | ||
for ; i < int(padToLength); i++ { | ||
buffer = append(buffer, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Appending 0 is OK here? Don't need something like \0
? I ask because this buffer is eventually going to be turned into a string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buffer
is a []byte
. As such, 0 seems the right thing to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but buffer
eventually gets turned into a string
, see https://github.com/Shopify/ghostferry/blob/e3f52d4/dml_events.go#L286. I'm not certain if that's the right encoding for the query as I'd assume MySQL wants you to specify \0
in a statement, especially since we're not using prepared statements here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I saw your comment too late and I just pushed the new patch-set. Let me check a bit more - I thought the binary conversion would end up being the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
admittedly: I'm coming from a c/c++ and Python background, so I don't know if it really makes a difference
in golang, 0
is a int
while '\x00'
is a int32
but they should be the same value when encoding. But I have no problem using '\x00'
if that's cleaner
dml_events.go
Outdated
c := value[i] | ||
if c == '\'' { | ||
buffer = append(buffer, '\'', '\'') | ||
} else { | ||
buffer = append(buffer, c) | ||
} | ||
} | ||
for ; i < int(padToLength); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit clever such that it might confuse future readers, but may be OK.
|
||
ghostferry = new_ghostferry(MINIMAL_GHOSTFERRY) | ||
|
||
ghostferry.on_status(Ghostferry::Status::ROW_COPY_COMPLETED) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the ROW_COPY_COMPLETED
event is sent, the binlog streamer is about to be terminated. I don't recall 100%, but I think there's a strong race possibility that the binlog is not picked up by Ghostferry.
When these events are sent and these ruby blocks are called, the Ghostferry code is blocked from proceeding until the ruby block returns. As such, a more fool-proof test would be something like this:
source_db.query("INSERT INTO ... (binary data...)")
row_copy_called = false
binlog_appled_called = false
ghostferry.on_status(Ghostferry::Status::AFTER_ROW_COPY) do
row_copy_called = true
# select row from the target and then make sure the data with 0 padding is present
# Perform the update query to set the data to `updated_data`.
end
ghostferry.on_status(Ghostferry::Status::AFTER_BINLOG_APPLY) do
binlog_apply_called = true
# select row from the target and make sure the data is on the updated_data
end
# run ghostferry,
# assert table identical
assert binlog_apply_called
assert row_copy_called
# if we want to be defensive, query the target and make sure the data is updated_data.
cool, working on a final PS
ok, then let's start with a vendor-patch and - in parallel - work on getting our changes into upstream |
1a9ea62
to
e7fb1ba
Compare
inserted_data = "ABC\x00" | ||
updated_data = "EFGH" | ||
|
||
source_db.query("INSERT INTO #{DEFAULT_FULL_TABLE_NAME} (id, data) VALUES (1, _binary'#{inserted_data}')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be inserting a value that is shorter than the column definition(eg: "ABC"
) and let mysql add the trailing 0s to reproduce the exact same issue reported here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestingly mysql seems to even strip the trailing whitespace if done like this. But your suggestion is perfectly valid and maybe a slightly better test-case
e7fb1ba
to
825bd63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! A few comments
test/integration/types_test.rb
Outdated
# update/delete statements, as the WHERE clause would not match existing | ||
# rows in the target DB | ||
inserted_data = "ABC" | ||
expected_inserted_data = inserted_data + "\x00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick, but in ruby
this is preferably expressed as:
expected_inserted_data = inserted_data + "\x00" | |
expected_inserted_data = "#{inserted_data}\x00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
test/integration/types_test.rb
Outdated
|
||
[source_db, target_db].each do |db| | ||
db.query("CREATE DATABASE IF NOT EXISTS #{DEFAULT_DB}") | ||
db.query("CREATE TABLE IF NOT EXISTS #{DEFAULT_FULL_TABLE_NAME} (id bigint(20) not null auto_increment, data BINARY(4), primary key(id))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for a case when the length is 0 (ex: create table a (d binary);) and for the decimal type above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for a case when the length is 0 (ex: create table a (d binary);)
sure
and for the decimal type above?
Just to avoid an unnecessary round-trip: what do you mean by that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the decimal.Decimal
case, the function is called with a length of 0
- appendEscapedString(buffer, v.String(), 0)
. I meant a test case of length 0
would cover both the decimal.Decimal
case as well as if the length of the column was 0
(ie: create table a (d binary);
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. I thought we already had a test-case for decimals, but I'll check.
dml_events.go
Outdated
// | ||
// We need to support right-padding of the generated string using 0-bytes to | ||
// mimic what a MySQL server would do for BINARY columns (with fixed length). | ||
// | ||
// ref: https://github.com/Shopify/ghostferry/pull/159 | ||
// | ||
// This is specifically mentioned in the the below link: | ||
// | ||
// When BINARY values are stored, they are right-padded with the pad value | ||
// to the specified length. The pad value is 0x00 (the zero byte). Values | ||
// are right-padded with 0x00 for inserts, and no trailing bytes are removed | ||
// for retrievals. | ||
// | ||
// ref: https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html | ||
func appendEscapedString(buffer []byte, value string, rightPadLengthForBinaryColumn uint) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think moving the 0-byte padding into a separate function would be cleaner, as the padding doesn't really have anything to do with properly escaping strings. That way, we only need to call it if column.Type == schema.TYPE_BINARY
and don't need to modify the case of decimal.Decimal
Reusing the naming, we can make a function like:
func rightPadLengthForBinaryColumn(buffer []byte, columnLength uint) []byte {
...
}
and just move the logic there. It can be called directly after appendEscapedString
in line 357.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true - seems cleaner and avoids the other invocation of the method with the awkward 0
parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, let me take that back. I agree it's cleaner, but the method currently does the quoting (as well as escaping the '
) and knows about the length of the string
sure, all of that can be broken apart, but does that make it more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I think handling the trailing '\''
that is appended to the buffer
in a separate method would make this less readable.
Another option would be to pass the column
to the appendEscapedString
method and conditionally handle the padding in that method:
// We need to support right-padding of the generated string using 0-bytes to
// mimic what a MySQL server would do for BINARY columns (with fixed length).
//
// ref: https://github.com/Shopify/ghostferry/pull/159
//
// This is specifically mentioned in the the below link:
//
// When BINARY values are stored, they are right-padded with the pad value
// to the specified length. The pad value is 0x00 (the zero byte). Values
// are right-padded with 0x00 for inserts, and no trailing bytes are removed
// for retrievals.
//
// ref: https://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html
func rightPadLengthForBinaryColumn(buffer []byte, padLength int) []byte {
// continue 0-padding up to the desired length as provided by the
// caller
for i := 0; i < int(padLength); i++ {
buffer = append(buffer, '\x00')
}
return buffer
}
// appendEscapedString replaces single quotes with quote-escaped single quotes.
// When the NO_BACKSLASH_ESCAPES mode is on, this is the extent of escaping
// necessary for strings.
//
// ref: https://github.com/mysql/mysql-server/blob/mysql-5.7.5/mysys/charset.c#L963-L1038
// ref: https://github.com/go-sql-driver/mysql/blob/9181e3a86a19bacd63e68d43ae8b7b36320d8092/utils.go#L717-L758
func appendEscapedString(buffer []byte, value string, column schema.TableColumn) []byte {
buffer = append(buffer, '\'')
for i := 0; i < len(value); i++ {
c := value[i]
if c == '\'' {
buffer = append(buffer, '\'', '\'')
} else {
buffer = append(buffer, c)
}
}
if column.Type == schema.TYPE_BINARY {
padLength := int(column.FixedSize) - len(value)
buffer = rightPadLengthForBinaryColumn(buffer, padLength)
}
return append(buffer, '\'')
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO appendEscapedString
is a pure string/conversion method and should not have to worry about columns or types. I have refactored out the trivial method for 0-padding now, but I feel making the former method aware of columns or types seems going the wrong direction (making the method less useful in other scenarios)
it's a nit and I'm fine either way, but I feel passing the column is wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly - just that it seemed a bit awkward to be forcing a param for all cases when not all applied. Either way can work 👍
825bd63
to
e64ef4e
Compare
dml_events.go
Outdated
// continue 0-padding up to the desired length as provided by the | ||
// caller | ||
for ; i < int(rightPadLengthForBinaryColumn); i++ { | ||
buffer = append(buffer, '\x00') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say that we're supposed to generate MySQL queries here. Is having a \0
valid in a query? Or should it be "\0"
? The difference is that the latter is the literal ascii character '\'
and '0'
while the former is actually just 0
, if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, I agree :-) My original code was injecting literal 0
(that is, the byte, not the character), and then I switched it to be \x00
because it was suggested as improvement.
Still, please note that in neither case are we inserting a 2-character string of \0
(that is, a \
and a 0
).
The only difference I can see is that one is an int and the other an int32. However: compared to you guys, I'm probably a golang noob and may simply be wrong here.
But, yes, we are building SQL statements, and we want to have literal 0-bytes (not 0
strings or \0
strings) be injected.
e64ef4e
to
7c3890d
Compare
This commit addresses a data corruption bug in the binlog streaming phase, where the binlog writer incorrectly propagates values in fixed-length BINARY columns that have trailing 0s in the original value. These trailing 0s are removed from the binlog by the SQL master and therefore do not show up in the WHERE clause for update/delete statements executed by the binlog writer. NOTE: This commit requires changes to one of the vendor'ed modules in github.com/siddontang/go-mysql that we patch directly in the local repo. We are working on getting these changes into the upstream module and will need to merge these changes once we can use the latest upstream module version. Change-Id: Ib9c1b7308e8198f1fd38439c37f444d9a8154e6a
7c3890d
to
37f2188
Compare
This was manually picked from #159. I made some minor adjustments to the comments and the function names to be more clear. Co-authored-by: Clemens Kolbitsch <kolbitsch@lastline.com> Co-authored-by: Shuhao Wu <shuhao@shuhaowu.com>
This was manually picked from #159. I made some minor adjustments to the comments and the function names to be more clear. Co-authored-by: Clemens Kolbitsch <kolbitsch@lastline.com> Co-authored-by: Shuhao Wu <shuhao@shuhaowu.com>
This commit addresses a data corruption bug in the binlog streaming phase, where the binlog writer incorrectly propagates values in fixed-length BINARY columns that have trailing 0s in the original value. These trailing 0s are removed from the binlog by the SQL master and therefore do not show up in the WHERE clause for update/delete statements executed by the binlog writer. This commit is manually cherry-picked from #159.
This was cherry-picked into a new PR and merged to master. |
Link: #157