Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

load: restore schemas in parallel #1466

Merged
merged 22 commits into from
Mar 29, 2021
Merged

load: restore schemas in parallel #1466

merged 22 commits into from
Mar 29, 2021

Conversation

hidehalo
Copy link
Contributor

@hidehalo hidehalo commented Feb 26, 2021

What problem does this PR solve?

Issue Number: close #717

What is changed and how it works?

  • add concurrent jobQueue deal with task of restore schemas
  • set up multiple go-routines of queue consumer
  • synchronous produce jobs
  • add VERBOSE environment variable to support verbose mode for shell scripts

Check List

Tests

  • Manual test

run many_tables(300*table) integration test on custom TiDB cluster with load balance

# show us  topo of cluster
tiup cluster display local

tidb Cluster: local
tidb Version: v4.0.0
ID                    Role          Host            Ports        OS/Arch       Status   Data Dir                          Deploy Dir
--                    ----          ----            -----        -------       ------   --------                          ----------
192.168.99.103:9093   alertmanager  192.168.99.103  9093/9094    linux/x86_64  Up       /opt/tidb-data/alertmanager-9093  /opt/tidb-deploy/alertmanager-9093
192.168.99.103:3000   grafana       192.168.99.103  3000         linux/x86_64  Up       -                                 /opt/tidb-deploy/grafana-3000
192.168.99.103:2379   pd            192.168.99.103  2379/2380    linux/x86_64  Up|L|UI  /opt/tidb-data/pd-2379            /opt/tidb-deploy/pd-2379
192.168.99.103:9090   prometheus    192.168.99.103  9090         linux/x86_64  Up       /opt/tidb-data/prometheus-9090    /opt/tidb-deploy/prometheus-9090
192.168.99.101:4001   tidb          192.168.99.101  4001/10081   linux/x86_64  Up       -                                 /opt/tidb-deploy/tidb-4001
192.168.99.104:4002   tidb          192.168.99.104  4002/10082   linux/x86_64  Up       -                                 /opt/tidb-deploy/tidb-4002
192.168.99.105:4003   tidb          192.168.99.105  4003/10083   linux/x86_64  Up       -                                 /opt/tidb-deploy/tidb-4003
192.168.99.102:20160  tikv          192.168.99.102  20160/20181  linux/x86_64  Up       /opt/tidb-data/tikv-20160         /opt/tidb-deploy/tikv-20160
192.168.99.106:20161  tikv          192.168.99.106  20161/20182  linux/x86_64  Up       /opt/tidb-data/tikv-20161         /opt/tidb-deploy/tikv-20161
192.168.99.107:20162  tikv          192.168.99.107  20162/20183  linux/x86_64  Up       /opt/tidb-data/tikv-20162         /opt/tidb-deploy/tikv-20162

Benchmark results

# before
[2021/03/04 16:52:18.633 +08:00] [INFO] [loader.go:1294] ["finish to create tables"] [task=test] [unit=load] ["cost time"=8m50.921992526s]
[2021/03/04 17:10:41.238 +08:00] [INFO] [loader.go:1294] ["finish to create tables"] [task=test] [unit=load] ["cost time"=6m19.557692896s]
[2021/03/04 17:50:02.573 +08:00] [INFO] [loader.go:1294] ["finish to create tables"] [task=test] [unit=load] ["cost time"=6m44.54306297s]

# after
[2021/03/04 16:27:28.277 +08:00] [INFO] [loader.go:1519] ["finish to create tables"] [task=test] [unit=load] ["cost time"=5m52.881223232s]
[2021/03/04 17:22:30.894 +08:00] [INFO] [loader.go:1519] ["finish to create tables"] [task=test] [unit=load] ["cost time"=6m27.53465309s]
[2021/03/04 18:10:42.671 +08:00] [INFO] [loader.go:1519] ["finish to create tables"] [task=test] [unit=load] ["cost time"=6m21.283285657s]

Side effects

  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

@lance6716
Copy link
Collaborator

/reward

@ti-challenge-bot
Copy link

The reward invalid.

@lance6716
Copy link
Collaborator

/reward 600

@ti-challenge-bot
Copy link

This PR do not have any linked issue.

Details

Tip :
You need to ensure that the link description follows the following template:

Issue Number: #xxx

Issue Number: close #xxx

About issue link, there is a trace issue.

Warning: None

@lance6716
Copy link
Collaborator

/reward 600

@ti-challenge-bot
Copy link

This PR's linked issue is not picked.

@lance6716
Copy link
Collaborator

/reward 600

@ti-challenge-bot
Copy link

Reward success.

@lance6716 lance6716 added needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated labels Mar 3, 2021
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review later

Dockerfile Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
tests/run.sh Outdated Show resolved Hide resolved
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loader/loader.go Outdated Show resolved Hide resolved
@hidehalo
Copy link
Contributor Author

@lance6716 PTAL anytime you have time :D

loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
@lance6716
Copy link
Collaborator

jobQueue needs a unit test

@hidehalo
Copy link
Contributor Author

jobQueue needs a unit test

Ok, I will provide some unit test(use cases) later.

@hidehalo
Copy link
Contributor Author

@lance6716 reply later...

loader/loader.go Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
@hidehalo
Copy link
Contributor Author

/run-all-tests

@hidehalo
Copy link
Contributor Author

@lance6716 PTAL

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
@hidehalo
Copy link
Contributor Author

rest lgtm

👌

@lance6716
Copy link
Collaborator

@GMHDBJD @lichunzhu PTAL

@lance6716 lance6716 removed the needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 label Mar 26, 2021
@lance6716
Copy link
Collaborator

/lgtm

@ti-chi-bot ti-chi-bot added the status/LGT1 One reviewer already commented LGTM label Mar 26, 2021
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated
Comment on lines 1370 to 1374
err = tblRestoreQueue.push(&restoreSchemaJob{
loader: l,
session: dbSessionPool[dbSessionID],
database: db,
table: table,
filepath: schemaFile,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sames that there will be several jobs running at the same time and using the same mysql connection? If so, is there any problem with this situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's no problem for now, maybe use buffered channel implement connection pool is more perfect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Or we can start connections for each consumer thread. And close them after we stop the consumers. Like we did in dumpling.
Besides, sql.Rows will use sql.Conn exclusively. We'd better refine this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, sql.Rows will use sql.Conn exclusively. We'd better refine this part.

I don't get it 🤣 , would you help me understand that? thx!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, sql.Rows will use sql.Conn exclusively. We'd better refine this part.

	go func() {
		rows, _ := conn.QueryContext(ctx, "select * from pingcap.tb")
		for rows.Next() {
			time.Sleep(time.Minute)
		}
	}()
	time.Sleep(time.Second)
	rows, err := conn.QueryContext(ctx, "select * from pingcap.tb")
	if err != nil {
		return errors.Trace(err)
	}
	for rows.Next() {
        }

I mean this kind of code will cause an error.

[mysql] 2021/03/26 18:20:32 packets.go:446: busy buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, sql.Rows will use sql.Conn exclusively. We'd better refine this part.

	go func() {
		rows, _ := conn.QueryContext(ctx, "select * from pingcap.tb")
		for rows.Next() {
			time.Sleep(time.Minute)
		}
	}()
	time.Sleep(time.Second)
	rows, err := conn.QueryContext(ctx, "select * from pingcap.tb")
	if err != nil {
		return errors.Trace(err)
	}
	for rows.Next() {
        }

I mean this kind of code will cause an error.

[mysql] 2021/03/26 18:20:32 packets.go:446: busy buffer

Thanks a lot for your reminder! I don't even know that has such a deeply hidden problem. Fortunately, our modification will only perform DDL operations, so it will not trigger concurrent read buffer situations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should has been fixed in aae003e @lichunzhu

loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader_test.go Outdated Show resolved Hide resolved
@hidehalo
Copy link
Contributor Author

@lichunzhu PTAL

loader/loader.go Show resolved Hide resolved
loader/loader.go Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM
Great job!

@lichunzhu
Copy link
Contributor

/lgtm

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lance6716
  • lichunzhu

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@ti-chi-bot ti-chi-bot added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Mar 29, 2021
@lance6716 lance6716 added the needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 label Mar 29, 2021
@lichunzhu
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 8291b8c

@hidehalo
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

@hidehalo: /merge is only allowed for the committers in list.

In response to this:

/merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@lance6716
Copy link
Collaborator

/run-all-tests

@ti-chi-bot ti-chi-bot merged commit de474a6 into pingcap:master Mar 29, 2021
ti-srebot pushed a commit to ti-srebot/dm that referenced this pull request Mar 29, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link

cherry pick to release-2.0 in PR #1544

@ti-srebot ti-srebot added already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked and removed needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 labels Mar 29, 2021
@lance6716 lance6716 added this to the v2.0.3 milestone Apr 9, 2021
@lance6716 lance6716 modified the milestones: v2.0.3, v2.0.4 May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated rewarded size/XL status/can-merge status/LGT2 Two reviewers already commented LGTM, ready for merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create tables in parallel in load unit
5 participants