Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: make load data atomic by default #18807

Merged
merged 5 commits into from
Aug 6, 2020

Conversation

tina77fritz
Copy link
Contributor

@tina77fritz tina77fritz commented Jul 27, 2020

Signed-off-by: Tina Fritz tina77fritz@gmail.com

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
load data uses several transactions by default which will break transaction atomic property.

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:
If tidb_dml_batch_size = 0, load data will atomic and make it default.

Related changes

Check List

Tests

  • Unit test
  • Manual test
  1. load a large file
  2. see the log of the server
[2020/07/27 23:07:59.160 +08:00] [INFO] [server.go:388] ["new connection"] [conn=5] [remoteAddr=127.0.0.1:46754]
[2020/07/27 23:08:52.888 +08:00] [INFO] [2pc.go:285] ["[BIG_TXN]"] [con=5] ["table ID"=50] [size=3921322] [keys=135218] [puts=135218] [dels=0] [locks=0] [checks=0] [txnStartTS=418345787592015872]
[2020/07/27 23:08:52.892 +08:00] [INFO] [2pc.go:394] ["2PC detect large amount of mutations on a single region"] [region=44] ["mutations count"=135218]
[2020/07/27 23:08:53.022 +08:00] [INFO] [2pc.go:394] ["2PC detect large amount of mutations on a single region"] [region=49] ["mutations count"=135218]
[2020/07/27 23:08:53.029 +08:00] [INFO] [load_data.go:255] ["commit one task success"] [conn=5] ["commit time usage"=901.4976ms] ["keys processed"=135218] ["tasks processed"=1] ["tasks in queue"=0]

Side effects

  • Performance regression
    • Consumes more MEM

Release note

  • No release note

Signed-off-by: Tina Fritz <tina77fritz@gmail.com>
@tina77fritz tina77fritz requested a review from a team as a code owner July 27, 2020 10:10
@tina77fritz tina77fritz requested review from wshwsh12 and removed request for a team July 27, 2020 10:10
@ti-srebot ti-srebot added the contribution This PR is from a community contributor. label Jul 27, 2020
@github-actions github-actions bot added the sig/execution SIG execution label Jul 27, 2020
Signed-off-by: Tina Fritz <tina77fritz@gmail.com>
@tina77fritz
Copy link
Contributor Author

/run-check_dev_2

@lysu lysu requested review from jackysp and cfzjywxk July 28, 2020 03:14
@cfzjywxk
Copy link
Contributor

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

@tina77fritz
Copy link
Contributor Author

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

emm... it will reach the transaction size limit if it is really a large transaction. The user will make a choice himself.

  1. enlarge the transaction size limit
  2. set tidb_dml_batch_size = xxx

@cfzjywxk
Copy link
Contributor

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

emm... it will reach the transaction size limit if it is really a large transaction. The user will make a choice himself.

  1. enlarge the transaction size limit
  2. set tidb_dml_batch_size = xxx

There will be memory usage for the executor itself, for example the data processing and cache may use some memory, and then they will be written into the transaction memory buffer. We could do some loading tests to check memory usage like #15369. Could you help us with this check and the memory usage optimization, many thanks.

@tina77fritz
Copy link
Contributor Author

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

emm... it will reach the transaction size limit if it is really a large transaction. The user will make a choice himself.

  1. enlarge the transaction size limit
  2. set tidb_dml_batch_size = xxx

There will be memory usage for the executor itself, for example the data processing and cache may use some memory, and then they will be written into the transaction memory buffer. We could do some loading tests to check memory usage like #15369. Could you help us with this check and the memory usage optimization, many thanks.

Maybe I have no time to optimize the memory usage for it :)
I just think it is terrible when a DBMS loses the atomic property of its transaction.

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 30, 2020
@jackysp jackysp added the sig/transaction SIG:Transaction label Jul 31, 2020
@tina77fritz
Copy link
Contributor Author

I think the memory usage optimization from @bobotu has been finished. Could it continue? @cfzjywxk

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 5, 2020

I think the memory usage optimization from @bobotu has been finished. Could it continue? @cfzjywxk

Yes, I think after we change the default value to one transaction, we need to change the load logic from
"preparing all data well in memory, submit this task to commit goroutine and do the batch check and insert, then write into transaction buffer",
into
"load and prepare data and write them into transaction memory buffer at the same time".
Then we will not hold the rows content all in memory and increase the oom risk, and the large transition memory will be controlled and may spill to disk in the future.
Would you like to help us with this? Thanks a lot~

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 5, 2020

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Aug 5, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Aug 5, 2020
@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 6, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 6, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@tina77fritz merge failed.

@codecov
Copy link

codecov bot commented Aug 6, 2020

Codecov Report

Merging #18807 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18807   +/-   ##
===========================================
  Coverage   79.2776%   79.2776%           
===========================================
  Files           546        546           
  Lines        148632     148632           
===========================================
  Hits         117832     117832           
  Misses        21309      21309           
  Partials       9491       9491           

@bobotu
Copy link
Contributor

bobotu commented Aug 6, 2020

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

/run-all-tests

@jackysp
Copy link
Member

jackysp commented Aug 6, 2020

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/executor contribution This PR is from a community contributor. sig/execution SIG execution sig/transaction SIG:Transaction status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants