-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/loader: add pkg to load data to mysql #436
Merged
Merged
Changes from 17 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
32a119f
pkg/loader: add pkg to load data to mysql
july2993 bc7d1a6
vendor/* update vendor add errgroup
july2993 c12019c
Merge branch 'master' into hjh/loader
july2993 9c9fd60
add a update pk&uk case and add comment about mergeByKey
july2993 296f878
Merge remote-tracking branch 'origin/hjh/loader' into hjh/loader
july2993 45d68fd
executor.go: Fix forgot to return err and quote column name
july2993 f85ec8b
bench_test.go set merge or not and add delete&update bench
july2993 d4f5cf7
loader/* don't always chagne insert -> replace refine some code
july2993 07cb6c9
Merge branch 'master' into hjh/loader
july2993 b229f43
add example and refine code
july2993 b9eb9ce
add metrics of loader, change NewLoader api
july2993 03d05d1
Merge remote-tracking branch 'origin/hjh/loader' into hjh/loader
july2993 5f5873b
load.go: merge pk and batch if and only if have pk and no uk
july2993 a06a5f4
load.go: fix not add item when DetectConflict
july2993 9b506fc
loader:* remove useless code and use utf8mb4
july2993 8684e66
load.go: Simplify some log
july2993 65232ad
add README.md
july2993 9c0bb48
Update pkg/loader/example_loader_test.go
kennytm 83ba23c
Update pkg/loader/README.md
kennytm e11c7ac
Update pkg/loader/README.md
kennytm 38850e1
Update pkg/loader/README.md
kennytm 84c637a
Update pkg/loader/merge_test.go
kennytm 81fb7b1
Update pkg/loader/load.go
kennytm b36ff7c
Update pkg/loader/README.md
kennytm 54aa4d8
Update pkg/loader/README.md
kennytm c23895f
loader: address comments and use strings.Builder
july2993 22076e0
merge.go factor some common case out
july2993 ff844d0
executor.go: Use fmt.Fprintf write data to builder
july2993 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
loader | ||
====== | ||
|
||
A pkg to load data to mysql in real-time, aim to be used by *reparo*, *drainer* etc unified. | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Getting started | ||
- Example is available via [example_loader_test.go](./example_loader_test.go) | ||
|
||
you need to write a translater to use *Loader* like *SlaveBinlogToTxn* in [translate.go](./translate.go) | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## Overview | ||
Loader will split the upstream transaction DML events and concurrently(shared by primary key or unique key) load data to mysql, it will solve the causality by [causality.go](./causality.go). | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## Optimization | ||
#### Large Operation | ||
Instead of execute DML one by one, we can combine many small operations into a single large operation like use INSERT statements with multiple VALUES lists to insert several rows at a time, this may get [high-speed](https://medium.com/@benmorel/high-speed-inserts-with-mysql-9d3dcd76f723) compare to insert one by one. | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Merge by Primary Key | ||
You may want to read [log-compaction](https://kafka.apache.org/documentation/#compaction) of kafka. | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Let's say for a table with Primary Key, we can treat it like a KV-store, to reload the table with the change history of table, we only need the last value for every key. | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
While syncing data into downstream at real-time, we can get DML events from upstream in batch and merge by key, after merge, there's only one event for one key, at downstream, we don't need doing as many events as upstream, this also help we to use batch insert operation. | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We should consider secondary unique key here, see *execTableBatch* in [executor.go](./executor.go). currently, we only merge by primary key and do batch operation if the table have primary key and no unique key. | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
package loader | ||
|
||
import ( | ||
"database/sql" | ||
"fmt" | ||
"sync" | ||
"testing" | ||
|
||
_ "github.com/go-sql-driver/mysql" | ||
"github.com/juju/errors" | ||
"github.com/ngaut/log" | ||
) | ||
|
||
func getTestDB() (db *sql.DB, err error) { | ||
dsn := "root:@tcp(127.0.0.1:3306)/?charset=utf8&interpolateParams=true&readTimeout=1m&multiStatements=true" | ||
db, err = sql.Open("mysql", dsn) | ||
return | ||
} | ||
|
||
func BenchmarkInsertMerge(b *testing.B) { | ||
benchmarkWrite(b, true) | ||
} | ||
|
||
func BenchmarkInsertNoMerge(b *testing.B) { | ||
benchmarkWrite(b, false) | ||
} | ||
|
||
func BenchmarkUpdateMerge(b *testing.B) { | ||
benchmarkUpdate(b, true) | ||
} | ||
|
||
func BenchmarkUpdateNoMerge(b *testing.B) { | ||
benchmarkUpdate(b, false) | ||
} | ||
|
||
func BenchmarkDeleteMerge(b *testing.B) { | ||
benchmarkDelete(b, true) | ||
} | ||
|
||
func BenchmarkDeleteNoMerge(b *testing.B) { | ||
benchmarkDelete(b, false) | ||
} | ||
|
||
func benchmarkUpdate(b *testing.B, merge bool) { | ||
log.SetLevelByString("error") | ||
|
||
r, err := newRunner(merge) | ||
if err != nil { | ||
b.Fatal(err) | ||
} | ||
|
||
dropTable(r.db, r.loader) | ||
createTable(r.db, r.loader) | ||
|
||
loadTable(r.db, r.loader, b.N) | ||
|
||
b.ResetTimer() | ||
updateTable(r.db, r.loader, b.N) | ||
|
||
r.close() | ||
} | ||
|
||
func benchmarkDelete(b *testing.B, merge bool) { | ||
log.SetLevelByString("error") | ||
|
||
r, err := newRunner(merge) | ||
if err != nil { | ||
b.Fatal(err) | ||
} | ||
|
||
dropTable(r.db, r.loader) | ||
createTable(r.db, r.loader) | ||
|
||
loadTable(r.db, r.loader, b.N) | ||
|
||
b.ResetTimer() | ||
deleteTable(r.db, r.loader, b.N) | ||
kennytm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
r.close() | ||
} | ||
|
||
func benchmarkWrite(b *testing.B, merge bool) { | ||
log.SetLevelByString("error") | ||
|
||
r, err := newRunner(merge) | ||
if err != nil { | ||
b.Fatal(err) | ||
} | ||
|
||
dropTable(r.db, r.loader) | ||
createTable(r.db, r.loader) | ||
|
||
b.ResetTimer() | ||
loadTable(r.db, r.loader, b.N) | ||
|
||
r.close() | ||
} | ||
|
||
type runner struct { | ||
db *sql.DB | ||
loader *Loader | ||
wg sync.WaitGroup | ||
} | ||
|
||
func newRunner(merge bool) (r *runner, err error) { | ||
db, err := getTestDB() | ||
if err != nil { | ||
return nil, errors.Trace(err) | ||
} | ||
|
||
loader, err := NewLoader(db, WorkerCount(16), BatchSize(128)) | ||
if err != nil { | ||
return nil, errors.Trace(err) | ||
} | ||
|
||
loader.merge = merge | ||
|
||
r = new(runner) | ||
r.db = db | ||
r.loader = loader | ||
|
||
r.wg.Add(1) | ||
go func() { | ||
err := loader.Run() | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
r.wg.Done() | ||
}() | ||
|
||
go func() { | ||
for range loader.Successes() { | ||
|
||
} | ||
}() | ||
|
||
return | ||
} | ||
|
||
func (r *runner) close() { | ||
r.loader.Close() | ||
r.wg.Wait() | ||
} | ||
|
||
func createTable(db *sql.DB, loader *Loader) error { | ||
var sql string | ||
|
||
sql = "create table test1(id int primary key, a1 int)" | ||
// sql = "create table test1(id int, a1 int, UNIQUE KEY `id` (`id`))" | ||
loader.Input() <- NewDDLTxn("test", "test1", sql) | ||
|
||
return nil | ||
} | ||
|
||
func dropTable(db *sql.DB, loader *Loader) error { | ||
sql := fmt.Sprintf("drop table if exists test1") | ||
loader.Input() <- NewDDLTxn("test", "test1", sql) | ||
return nil | ||
} | ||
|
||
func loadTable(db *sql.DB, loader *Loader, n int) error { | ||
var txns []*Txn | ||
for i := 0; i < n; i++ { | ||
txn := new(Txn) | ||
dml := new(DML) | ||
dml.Database = "test" | ||
dml.Table = "test1" | ||
dml.Tp = InsertDMLType | ||
dml.Values = make(map[string]interface{}) | ||
dml.Values["id"] = i | ||
dml.Values["a1"] = i | ||
|
||
txn.AppendDML(dml) | ||
txns = append(txns, txn) | ||
} | ||
|
||
for _, txn := range txns { | ||
loader.Input() <- txn | ||
} | ||
|
||
return nil | ||
} | ||
|
||
func updateTable(db *sql.DB, loader *Loader, n int) error { | ||
var txns []*Txn | ||
for i := 0; i < n; i++ { | ||
txn := new(Txn) | ||
dml := new(DML) | ||
dml.Database = "test" | ||
dml.Table = "test1" | ||
dml.Tp = UpdateDMLType | ||
dml.OldValues = make(map[string]interface{}) | ||
dml.OldValues["id"] = i | ||
dml.OldValues["a1"] = i | ||
|
||
dml.Values = make(map[string]interface{}) | ||
dml.Values["id"] = i | ||
dml.Values["a1"] = i * 10 | ||
|
||
txn.AppendDML(dml) | ||
txns = append(txns, txn) | ||
} | ||
|
||
for _, txn := range txns { | ||
loader.Input() <- txn | ||
} | ||
|
||
return nil | ||
} | ||
|
||
func deleteTable(db *sql.DB, loader *Loader, n int) error { | ||
var txns []*Txn | ||
for i := 0; i < n; i++ { | ||
txn := new(Txn) | ||
dml := new(DML) | ||
dml.Database = "test" | ||
dml.Table = "test1" | ||
dml.Tp = DeleteDMLType | ||
dml.Values = make(map[string]interface{}) | ||
dml.Values["id"] = i | ||
dml.Values["a1"] = i | ||
july2993 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
txn.AppendDML(dml) | ||
txns = append(txns, txn) | ||
} | ||
|
||
for _, txn := range txns { | ||
loader.Input() <- txn | ||
} | ||
|
||
return nil | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(TBH the code feels more like "syncer" than "loader" 🙃)