Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VTShovel - VReplication support for external databases #5289

Merged
merged 33 commits into from
Dec 5, 2019

Conversation

rafael
Copy link
Member

@rafael rafael commented Oct 10, 2019

Description

Have you ever wanted to leverage the powers of vreplication outside the environment of Vitess? Do you dream about copying bytes? The following PR will have a solution for you.

Introducing: vtshovel . A flexible tool that allows you to create vreplication streams directly from mysql instances outside of the vitess ecosystem.

To give a bit of context about the motivation for this tool, we (Slack) are in the process of migrating entire databases from our legacy mysql clusters to Vitess. We plan to leverage this tool to help us get in sync mysql instances from our legacy clusters to their Vitess counterparts.

We are thinking that other folks might find useful to have a tool like this when doing migrations.

Core Design

  • The core design for this feature is to leverage all the vreplication framework. It addition to vttabletss, vreplication streams can now point to external databases. This is done by introducing an abstraction that implements vstreamer methods:
    // VStreamerClient exposes the core interface of a vstreamer
    type VStreamerClient interface {
        // Open sets up all the environment for a vstream
        Open(ctx context.Context) error
        // Close closes a vstream
        Close(ctx context.Context) error
             // VStream streams VReplication events based on the specified filter.
            VStream(ctx context.Context, startPos string, filter *binlogdatapb.Filter, send func([]*binlogdatapb.VEvent) error) error
            // VStreamRows streams rows of a table from the specified starting point.
            VStreamRows(ctx context.Context, query string, lastpk *querypb.QueryResult, send func(*binlogdatapb.VStreamRowsResponse) error) error
     }
    
  • Depending on the configuration of the vreplication, a vplayer will choose between a TabletVStreamerClient and a NewMySQLVStreamerClient.
  • There is some technical debt introduced in the way we are choosing credentials for the external mysql. At the moment we don't have a good way to do that. To not increase the scope of this PR we added erepl to dbconfigs.
  • I added good test coverage to vstreamer_client and also an integration test for the vplayer.

Additional changes

  • This PR also adds supports for statement based replication. Binlonplayer can understand both statement and row based replication. Certain types of filters won't be supported in statement based and the stream will fail in such cases. At the moment match all rules will be supported for statement based replication streams.

  • It also added support to stream without using gtids. This was done cutting some corners, but it will be cleaned up soon. @sougou and I are working on that.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* Adds support for VStream to start from filename:pos and not gtid sets.
* Adds support for statement based replication streams (this should only be used
  in the context of mysql streamer, it is not safe for tablet vreplicaiton).
* Adds support to run vstream from mysql directly

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@rafael rafael requested a review from sougou as a code owner October 10, 2019 20:51
* Adds binary to run vtshovel.
* At the moment only working in ephemeral mode (i.e no data is persisted back to
  vrsettings).
* vtshovel only works for statement based replication right now. This is due to
  now having a good way to have a schema loader. We will itereate on this.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* This will be removed in future PR. Adding while in POC

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach very nice overall. A few minor nits.

go/vt/srvtopo/resilient_server.go Outdated Show resolved Hide resolved
go/mysql/flavor.go Outdated Show resolved Hide resolved
go/mysql/replication_position.go Outdated Show resolved Hide resolved
go/vt/dbconfigs/dbconfigs.go Show resolved Hide resolved
go/vt/vttablet/tabletmanager/action_agent.go Show resolved Hide resolved
go/vt/vttablet/tabletmanager/vreplication/vplayer.go Outdated Show resolved Hide resolved

// NewMySQLVStreamerClient is a vstream client that allows you to stream directly from MySQL.
// In order to achieve this, the following creates a vstreamer Engine with a dummy in memorytopo.
func NewMySQLVStreamerClient(sourceConnParams *mysql.ConnParams) *MySQLVStreamerClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this function can pull the dbconfigs based on the external repl user name. Then you don't have to pass it through to vreplication.Engine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that the end goal is to be able to point to any external DB, having a this parameter here will make it more flexible.

I think we shouldn't rely heavily in the repl username as we would like to refactor that soon.

What do you think?

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@rafael rafael changed the title WIP - VTShovel POC VTShovel - VReplication support for external databases Nov 6, 2019
* At the moment we only support erpel user. Passing source conn params around
was adding unnecessary complexity.
* This cleans up that and makes it more explicit that only erepl user is
supported. In the future we will add more flexibility in terms of what kind of
users can be configured for external vreplication streams

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* Fix typo in some comments.
* Make VReplicator private again. This change is no longer needed. Originally we
wanted "vtshovel" to be an external process. Given that this now hooks into the
existent engine, there is no need to make this public.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
* StripChecksum was changing the type of the event. This was a bug.
* Adds test to vstreamer to reflect new support for statement based replication

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Couple of comments.

proto "github.com/golang/protobuf/proto"
math "math"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a conflict between grpc code gen and goimports. If you re-run goimports, all these files will revert to unchanged. Or, you can just manually revert these yourself.

go/vt/vttablet/tabletmanager/vreplication/vplayer.go Outdated Show resolved Hide resolved
* Compute canAcceptStmtEvents when creating vplayer.

Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
@rafael
Copy link
Member Author

rafael commented Nov 11, 2020

Hi @jawabuu, the way this code ended up landing, is not a separate binary. It is intended to be run as part of vttablet. The way it works is that is possible to have VReplication streams where the source is external.

@rafael
Copy link
Member Author

rafael commented Nov 11, 2020

There is some discussion about how this is used in our Slack community: https://vitess.slack.com/archives/C0PQY0PTK/p1604989571062700?thread_ts=1579649445.062400&cid=C0PQY0PTK

systay pushed a commit that referenced this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants