*: make load data atomic by default #18807

tina77fritz · 2020-07-27T10:10:17Z

Signed-off-by: Tina Fritz tina77fritz@gmail.com

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
load data uses several transactions by default which will break transaction atomic property.

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:
If tidb_dml_batch_size = 0, load data will atomic and make it default.

Related changes

Check List

Tests

Unit test
Manual test

load a large file
see the log of the server

[2020/07/27 23:07:59.160 +08:00] [INFO] [server.go:388] ["new connection"] [conn=5] [remoteAddr=127.0.0.1:46754]
[2020/07/27 23:08:52.888 +08:00] [INFO] [2pc.go:285] ["[BIG_TXN]"] [con=5] ["table ID"=50] [size=3921322] [keys=135218] [puts=135218] [dels=0] [locks=0] [checks=0] [txnStartTS=418345787592015872]
[2020/07/27 23:08:52.892 +08:00] [INFO] [2pc.go:394] ["2PC detect large amount of mutations on a single region"] [region=44] ["mutations count"=135218]
[2020/07/27 23:08:53.022 +08:00] [INFO] [2pc.go:394] ["2PC detect large amount of mutations on a single region"] [region=49] ["mutations count"=135218]
[2020/07/27 23:08:53.029 +08:00] [INFO] [load_data.go:255] ["commit one task success"] [conn=5] ["commit time usage"=901.4976ms] ["keys processed"=135218] ["tasks processed"=1] ["tasks in queue"=0]

Side effects

Performance regression
- Consumes more MEM

Release note

No release note

Signed-off-by: Tina Fritz <tina77fritz@gmail.com>

tina77fritz · 2020-07-28T03:04:02Z

/run-check_dev_2

cfzjywxk · 2020-07-28T03:33:12Z

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

tina77fritz · 2020-07-28T03:37:47Z

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

emm... it will reach the transaction size limit if it is really a large transaction. The user will make a choice himself.

enlarge the transaction size limit
set tidb_dml_batch_size = xxx

cfzjywxk · 2020-07-28T03:47:45Z

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

emm... it will reach the transaction size limit if it is really a large transaction. The user will make a choice himself.

enlarge the transaction size limit

set tidb_dml_batch_size = xxx

There will be memory usage for the executor itself, for example the data processing and cache may use some memory, and then they will be written into the transaction memory buffer. We could do some loading tests to check memory usage like #15369. Could you help us with this check and the memory usage optimization, many thanks.

tina77fritz · 2020-07-28T03:55:59Z

Maybe it's easy to use much memory loading a lot of data, I think we'd better make it atomic after the memory improvement task #17479. @lysu @jackysp @imtbkcat What do you think?

emm... it will reach the transaction size limit if it is really a large transaction. The user will make a choice himself.

enlarge the transaction size limit

set tidb_dml_batch_size = xxx

There will be memory usage for the executor itself, for example the data processing and cache may use some memory, and then they will be written into the transaction memory buffer. We could do some loading tests to check memory usage like #15369. Could you help us with this check and the memory usage optimization, many thanks.

Maybe I have no time to optimize the memory usage for it :)
I just think it is terrible when a DBMS loses the atomic property of its transaction.

jackysp

LGTM

tina77fritz · 2020-08-05T04:44:39Z

I think the memory usage optimization from @bobotu has been finished. Could it continue? @cfzjywxk

cfzjywxk · 2020-08-05T08:46:27Z

I think the memory usage optimization from @bobotu has been finished. Could it continue? @cfzjywxk

Yes, I think after we change the default value to one transaction, we need to change the load logic from
"preparing all data well in memory, submit this task to commit goroutine and do the batch check and insert, then write into transaction buffer",
into
"load and prepare data and write them into transaction memory buffer at the same time".
Then we will not hold the rows content all in memory and increase the oom risk, and the large transition memory will be controlled and may spill to disk in the future.
Would you like to help us with this? Thanks a lot~

cfzjywxk · 2020-08-05T08:46:34Z

LGTM

cfzjywxk · 2020-08-06T01:55:12Z

/merge

ti-srebot · 2020-08-06T01:57:48Z

/run-all-tests

ti-srebot · 2020-08-06T02:03:16Z

@tina77fritz merge failed.

codecov · 2020-08-06T02:06:05Z

Codecov Report

Merging #18807 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18807   +/-   ##
===========================================
  Coverage   79.2776%   79.2776%           
===========================================
  Files           546        546           
  Lines        148632     148632           
===========================================
  Hits         117832     117832           
  Misses        21309      21309           
  Partials       9491       9491

bobotu · 2020-08-06T03:08:55Z

/merge

ti-srebot · 2020-08-06T03:09:16Z

/run-all-tests

ti-srebot · 2020-08-06T03:19:22Z

/run-all-tests

jackysp · 2020-08-06T12:52:40Z

/merge

ti-srebot · 2020-08-06T12:54:58Z

/run-all-tests

*: make load data atomic

fdd6d22

Signed-off-by: Tina Fritz <tina77fritz@gmail.com>

tina77fritz requested a review from a team as a code owner July 27, 2020 10:10

tina77fritz requested review from wshwsh12 and removed request for a team July 27, 2020 10:10

ti-srebot added the contribution This PR is from a community contributor. label Jul 27, 2020

github-actions bot added the sig/execution SIG execution label Jul 27, 2020

fix CI

fe2fb89

Signed-off-by: Tina Fritz <tina77fritz@gmail.com>

lysu requested review from jackysp and cfzjywxk July 28, 2020 03:14

jackysp reviewed Jul 30, 2020

View reviewed changes

ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 30, 2020

jackysp added the sig/transaction SIG:Transaction label Jul 31, 2020

github-actions bot added the component/executor label Aug 5, 2020

ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Aug 5, 2020

ti-srebot approved these changes Aug 5, 2020

View reviewed changes

ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Aug 5, 2020

cfzjywxk approved these changes Aug 5, 2020

View reviewed changes

ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 6, 2020

Merge branch 'master' into atomic_load_data

fd3d52f

Merge branch 'master' into atomic_load_data

c2e5c49

Merge branch 'master' into atomic_load_data

2595872

ti-srebot merged commit 32963d3 into pingcap:master Aug 6, 2020

This was referenced Sep 4, 2020

Support incremental commit of INSERT...SELECT #18038

Open

Document missing system variables pingcap/docs#3155

Closed

tangenta mentioned this pull request Nov 18, 2021

Update basic-features.md for v5.3 pingcap/docs-cn#7449

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: make load data atomic by default #18807

*: make load data atomic by default #18807

tina77fritz commented Jul 27, 2020 •

edited

Loading

tina77fritz commented Jul 28, 2020

cfzjywxk commented Jul 28, 2020

tina77fritz commented Jul 28, 2020

cfzjywxk commented Jul 28, 2020

tina77fritz commented Jul 28, 2020

jackysp left a comment

tina77fritz commented Aug 5, 2020

cfzjywxk commented Aug 5, 2020

cfzjywxk commented Aug 5, 2020

cfzjywxk commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

codecov bot commented Aug 6, 2020 •

edited

Loading

bobotu commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

jackysp commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

*: make load data atomic by default #18807

*: make load data atomic by default #18807

Conversation

tina77fritz commented Jul 27, 2020 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Related changes

Check List

Release note

tina77fritz commented Jul 28, 2020

cfzjywxk commented Jul 28, 2020

tina77fritz commented Jul 28, 2020

cfzjywxk commented Jul 28, 2020

tina77fritz commented Jul 28, 2020

jackysp left a comment

Choose a reason for hiding this comment

tina77fritz commented Aug 5, 2020

cfzjywxk commented Aug 5, 2020

cfzjywxk commented Aug 5, 2020

cfzjywxk commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

codecov bot commented Aug 6, 2020 • edited Loading

Codecov Report

bobotu commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

jackysp commented Aug 6, 2020

ti-srebot commented Aug 6, 2020

tina77fritz commented Jul 27, 2020 •

edited

Loading

codecov bot commented Aug 6, 2020 •

edited

Loading