Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/loader: add pkg to load data to mysql #436

Merged
merged 28 commits into from
Jan 17, 2019
Merged

pkg/loader: add pkg to load data to mysql #436

merged 28 commits into from
Jan 17, 2019

Conversation

july2993
Copy link
Contributor

@july2993 july2993 commented Jan 7, 2019

What problem does this PR solve?

a pkg to use to load data to mysql in realtime, can be use in test and reparo, drainer later

#438 contains only update vendor part, you can review and merge that first

What is changed and how it works?

see tests/kafka/kafka.go as a example,
most logic is same as drainer, but if the table has PK, it will merge the same PK record(see merge.go), after merge, only have one record for one PK, so we can easily do bulk insert. this is also beneficial if the upstream update some hot key frequently.

more thinks maybe we can do:

  • currently like in drainer, for update event we get all field and set all field in the update sql, but in most time only one or a few field is updated really, we can only set that fields to reduce data send to downstream.

Check List

Tests

  • Integration test

Code changes

Side effects

Related changes

@july2993
Copy link
Contributor Author

july2993 commented Jan 7, 2019

/run-all-tests

@IANTHEREAL
Copy link
Collaborator

currently like in drainer, for update event we get all field and set all field in the update sql, but in most time only one or a few field is updated really, we can only set that fields to reduce data send to downstream.

we did it in syncer ago, but some column had trigger attributes, if we can solve it, I like this way

@IANTHEREAL
Copy link
Collaborator

update bench test result?

@july2993
Copy link
Contributor Author

july2993 commented Jan 7, 2019

a bench test of bench_test.go running locally with downstream mysql 5.7

worker-count&batch-size: 16 128
BenchmarkInsertKey 1000000 26680 ns/op 37481op/second
BenchmarkInsertNoKey 300000 72654 ns/op 13763 op/second

change worker-count&batch-size: 32 256
BenchmarkInsertKey 3000000 11767 ns/op 84983 op/second
BenchmarkInsertNoKey 300000 71042 ns/op 14076 op/second

@july2993
Copy link
Contributor Author

july2993 commented Jan 8, 2019

@GregoryIan PTAL

pkg/loader/executor.go Show resolved Hide resolved
pkg/loader/executor.go Outdated Show resolved Hide resolved
pkg/loader/executor.go Outdated Show resolved Hide resolved
pkg/loader/merge.go Show resolved Hide resolved
pkg/loader/merge.go Outdated Show resolved Hide resolved
pkg/loader/load.go Show resolved Hide resolved
pkg/loader/load.go Outdated Show resolved Hide resolved
pkg/loader/load.go Outdated Show resolved Hide resolved
pkg/loader/model.go Outdated Show resolved Hide resolved
return
}

func quoteSchema(schema string, table string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some problem about tidb version dependency, keep this simple func

pkg/loader/load.go Outdated Show resolved Hide resolved
pkg/loader/load.go Show resolved Hide resolved
pkg/loader/load.go Show resolved Hide resolved
pkg/loader/model.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@IANTHEREAL IANTHEREAL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two important things about it:

  1. export input channel and successful channel to control load
    how does user know TXNs were all finished?
    compare one by one,
    or sort them and get one successful TXN, then we can know the TXNs before it were already finished.

It's better to have a README.md and example

  1. lack of metrics and log

pkg/loader/load.go Outdated Show resolved Hide resolved
pkg/loader/load.go Outdated Show resolved Hide resolved
pkg/loader/util.go Outdated Show resolved Hide resolved
pkg/loader/util.go Show resolved Hide resolved
pkg/loader/util.go Show resolved Hide resolved
pkg/loader/load.go Outdated Show resolved Hide resolved
pkg/loader/model.go Outdated Show resolved Hide resolved
pkg/loader/model.go Outdated Show resolved Hide resolved
pkg/loader/load.go Outdated Show resolved Hide resolved
var singleDMLs []*DML

for tableName, tableDMLs := range tables {
if len(tableDMLs[0].primaryKeys()) > 0 && s.merge {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we said, we should consider unique keys relations on multiple rows, I think we should check primary key count = 1 and unique key = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see

// we merge dmls by primary key, after merge by key, we

but i can add the limit or add an option to open this feature now, maybe use it after more test and prove, or when meet performance problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: 5f5873b

@IANTHEREAL
Copy link
Collaborator

Ping @july2993, we can speed this pr

@july2993
Copy link
Contributor Author

add example and metrics @GregoryIan @WangXiangUSTC PTAL

kennytm and others added 10 commits January 16, 2019 11:14
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
Co-Authored-By: july2993 <july2993@gmail.com>
@july2993
Copy link
Contributor Author

@GregoryIan @kennytm @WangXiangUSTC PTAL

Copy link
Contributor

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

tableName := inserts[0].Table

builder := new(strings.Builder)
builder.WriteString(fmt.Sprintf("REPLACE INTO %s(%s) VALUES ", quoteSchema(dbName, tableName), buildColumnList(info.columns)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@july2993 This isn't resolved 🙃

@kennytm kennytm removed their assignment Jan 17, 2019
@july2993
Copy link
Contributor Author

@WangXiangUSTC @kennytm PTAL

@WangXiangUSTC
Copy link
Contributor

LGTM

@july2993 july2993 merged commit b614c9c into master Jan 17, 2019
@july2993 july2993 mentioned this pull request Feb 26, 2019
july2993 added a commit that referenced this pull request Feb 27, 2019
* pkg/loader: add pkg to load data to mysql (#436)

* tests/* add swap unique index value test (#437)

* fix json type and may lost data when restart (#463)

* translate.go use string type instead of []byte for json field
@july2993 july2993 deleted the hjh/loader branch May 4, 2019 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants