Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update go mysql to address a very rare data corruption problem #307

Merged
merged 2 commits into from
Sep 22, 2021

Conversation

shuhaowu
Copy link
Contributor

@shuhaowu shuhaowu commented Sep 21, 2021

The siddontang version of go-mysql has been updated to https://github.com/go-mysql-org/go-mysql. I had to do a rename for everything.

Technically, we don't need this to fix the BINARY column problem (basically i need to rebase #159), as the latest version we pin already has the fix we need in the upstream library, but I thought this is a good chance to do this as it is a decent chunk of work anyway.

Changelog: go-mysql-org/go-mysql@803944a...v1.3.0. Highlights:

Minor problem with the YEAR column

It seems like if there are values of 0 with columns with type YEAR(4), Ghostferry will corrupt data, but will be caught by the inline verifier. The problem occurs when we UPDATE an existing YEAR column that has been copied by the DataIterator from or to a value of 0, which is a possible value in non-strict mode. The full scale of this issue has not been investigated, as this is a rarely used data type.

I'm not sure if the upstream fix is sufficient, as I have doubts over how it works (simply adding the value by 1900 seems wrong). More investigation is likely required.

Rare data loss due to binlog position comparison bug when binlog file numbering reaches 1M+

There's a rare bug in the upstream go-mysql library that is fixed in this upstream PR that can result in a rare data loss event if: (1) Ghostferry is currently streaming binlog file 999999 and sets a target position of 1000000. This is because the mysql.Position.Compare method was performing a string comparison of the binlog filenames, and thus incorrectly thinks that binlog file 999999 is larger than 1000000, which would then result in the early termination of the binlog streamer, which can cause data loss. The relevant BinlogStreamer code is here. This is confirmed with the following sample code:

ackage main

// import "github.com/siddontang/go-mysql/mysql"
 import "github.com/go-mysql-org/go-mysql/mysql"
import "fmt"

func main() {
        p1 := mysql.Position{"mysql-bin.999999", 4}
        p2 := mysql.Position{"mysql-bin.1000000", 4} // 1M case

        fmt.Println(p1, p2)
        fmt.Println("expected = -1, actual =", p1.Compare(p2))
        fmt.Println("expected = 1, actual =", p2.Compare(p1))

        p1 = mysql.Position{"mysql-bin.9999999", 4}
        p2 = mysql.Position{"mysql-bin.10000000", 4} // 10M case just in case..

        fmt.Println(p1, p2)
        fmt.Println("expected = -1, actual =", p1.Compare(p2))
        fmt.Println("expected = 1, actual =", p2.Compare(p1))
}

When executed with the vendored go-mysql library currently in the repo, we get:

(mysql-bin.999999, 4) (mysql-bin.1000000, 4)
expected = -1, actual = 1
expected = 1, actual = -1
(mysql-bin.9999999, 4) (mysql-bin.10000000, 4)
expected = -1, actual = 1
expected = 1, actual = -1

With 1.3.0 as I updated with this PR:

(mysql-bin.999999, 4) (mysql-bin.1000000, 4)
expected = -1, actual = -1
expected = 1, actual = 1
(mysql-bin.9999999, 4) (mysql-bin.10000000, 4)
expected = -1, actual = -1
expected = 1, actual = 1

This should be a very very rare condition, but a serious one as no online verifiers will be able to catch this kind of data loss.

@shuhaowu shuhaowu marked this pull request as ready for review September 21, 2021 19:58
@caution-tape-bot
Copy link

We noticed that this PR either modifies or introduces code that pipes the output of curl or wget into a shell. This is dangerous because it grants full control over the machine that downloads and executes the script, with no way to control or monitor what is executed.
Code should be fetched in a way where its authenticity can be verified, for instance through a package manager, or by verifying a cryptographic hash or signature. If you are unsure how to do this, please ask in #help-appsec and we'll be happy to take a look. cc: @Shopify/appsec-breakers

@shuhaowu shuhaowu changed the title Update go mysql Update go mysql to address a very rare data corruption problem Sep 21, 2021
Comment on lines +110 to +111
this.Require().Contains(tableSchemaCache, testhelpers.TestSchemaName+"."+table)
this.Require().Equal(1, len(tableSchemaCache[testhelpers.TestSchemaName+"."+table].PKColumns))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace change on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the default Go formatter on all these files. I also dislike how they collapse spaces around these operators, but Go formatter doesn't allow you to customize because all Go code must look the same i guess.

@shuhaowu shuhaowu merged commit 8e83826 into master Sep 22, 2021
@shuhaowu shuhaowu deleted the update-go-mysql branch September 22, 2021 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants